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F^n5^T^<, X;*--V>y M^'X^F PR E G 2 0 OCiiTOtlT^ 
g& if > y h ^fuc iiip LT> -r- * fc&x- £ X p >y b ftfcWffiJg 
x- £ f; fc iifgMJ® £ L T V ^ ©*^«Sye # § J: 3 fcf & » L 

hl^X^F PR E G 2 0 0&C*&#fl2tl.3o cntiCCO'tffR^U^X^AV^A 

[0 0 5 8] 

2 2 OrtJct&SW^nSo l^X^y^ATJP^y^ 3 OteX*— ^?>y hUS^ 
X£F PRE G 2 0 0lCT^-fc!XT#£<DT\ ^n«§itKotl^f-^ 

Z >s>y >7 XhuV >7 2 3 0&HSS2 2 5£/TLTWlfcofc-r-#£l^X# 

gi57^-T>y hfcl^S^ U^X^/^^A^JD^'y * 2 3 0 
*fT9«k5K:#fi8 , r«2:fcti?#So 0>J*.l£> »ffittH3ii. abc-fc^-X 

V^A^nv^y * 2 3 OJcioTf^&nrciiAP-r— ^^iR^-rSfcfetc-r— ^X 
P-y Mi#il3 2 e >y hckDi/K e>frA#V\, 
[0 0 5 9] 

D-FXh7Wa-7 F4 2{^ f*— *ffi*US>X*/*>* 2 2 0&CP-F 



(24) ftm 2002-517038 

?Z>r£tf?%:< > -r-**nVn 2 6 CD 1 o^tefeMtgW^X^A Ui>"X 
ffilAJdTa— 9*X-r-#X • WmWTs* F P S C R 2 1 0, KtSC 

iifc-e^So ^L^mmmic^^r, jx-ifx-r— *x • w®\sVt.z fps 

C R 2 1 0t±3.-W7^"feX-e#*«|j«lf V hfe^DrnXf-^xe-y 
[0 0 6 0] 

ltv^±i^ P-^h7»a-7 h 4 2&^n^j£i;fciMl^^2n, 

«f-^7- K^l/^^/^i' 2 2 OfribU^XZ/iytmtiviSv ^ 2 
4 O^LTP-FX h7MH-7 h 4 2^M^ffi2tl3 0 U^X^/^V^ttl 

a^fiJWrrSfci&fclF PR E G U^X# 2 0 OCOftmcTZ-izXT&o ^LTil 

^4 2K«l&t5o 
[00 6 1] 

»Wt§Mi©f-^XD7 F^f§ 0 LfrU£f^Ji<D&?3;lA^lS£ 

-7F4 2«FPREG l^X* 2 0 0©^I&, ST Ctl^Ol*)^* * 
UJC^LT^P»Uv ? X^/^^2 2 0F^CD#^^nfc«fSO-r-^Xn>y h 



(25) ^g2 00 2-5 1 7 0 3 8 

[0 0 6 2] 

LfcAbt, i^tJX h 7 F i: Ltl/^X^^y ^ 2 2 0 
>y hOttanSHRftOT?, F P R E G UisXZ 2 0 0 <DFtyg£> K:«*fiStt3 

o 

CO 0 6 3] 

xny bft^&^£ftfcx-##^:/#«J-r§ fpreg i^x# 2 0 oofl 

fpreg uisxz 2 0 o<Dftm%mmmcmm^%rc&icm<Difci ! !%:m : T bft 
tufa £ ft i/ 1 ^ o mm% m ^ > l fc # o r ^ y £ x h r £ « * ^ v 

[0 0 6 4] 

, jiAPOi/Xf-AU^X^, MZ-l£ F P S C Rtx>?X£ 2 1 0, *<&mz.JfoVT 

ZXUy h^t§^X^/^^2 2 0 *_T 2* £ 0 3 3f@<7)-r-*X 

P <y h^X h 7^«2tlfcl^ FPREG U>?X* 200!i U^X*^ 
>^ 2 2 01*30 3 2{!cDT J -£Xn>y h ©|*|^£:±fc&Cp< ^ V lC&m~£tlZ> 0 Lfr 
U k>*X^^^Cf-^Dy hi^i£gx.£Sft£itIt iA(i*3 5, # 
WfeZfttcm^ n- FXf7Wa-y h 4 2«cn^ FPREG l^X^ 
2 0 0©rt§i:^^^>^2 2 OO-r-^Xny hf£ttt?ft< FPSCRU 
>>x£2 l 0O^t^tU^W^S^i:»§i:i:^f§o 
-fe y im 2 £ ft § ->XfA U -J 7s $ , tfij*. if n -tr y l^c J: o T^^M* 
^f§^LfcM^#^-r§^imu^X^, ^tt^d^tT^^o XhT^tc 
fe^TJiftSfm #IJxJ£3 7, ^#^£nfctf£\ o-FXh7»-7F 
4 2&Ctl£:> FPSCRl/^^2 10, FPREG^X^2 00i:k>^ 



(26) ^mZ 0 0 2-5 1 7 0 3 8 

[0 0 6 5] 

y?<Dfo®%$a-DT^Z>M&, F PRE Gls^TsZ 2 0 OOrt^^^UtC^ 
[00 6 6] 

ZtD&vftm^, F P R E G WTs* 2 0 0 OftSte /'ty ?<D\H&£ 

X £ & ftg £ C Tl&iWr § c £ t* # S o 

[0 0 6 7] 

&«©p-FlfM*&£WflKS^ ^*^o-feX^fft>nSo Ift^t, p 
- KXh7«iJia->y h 4 2 ttltiOf-na >y Ffc^-rsfgJfitfiP- F 
firft*&1tf%LZt. F 0 0©f«3^F PREG^X^2 0 

Otn-FL, p-Ff^"t?#5££*i;fcXPy Ft^Jti^^XfAU^^ 

©rt&*p— fu sejcfHRfiiof*— ^y— k*u^x^/^>^ 2 2 oofi^ 

F^tff^nfcf-^Xny h^3 3^§i^ F P R E G 
m*F P R E G l^X£ 2 0 CHCP—FStU ^fttettl^T 3 2{@<Dt :: -£XP 

3 5^51^, ±IB<DP fc 3^f£^T>S:< , F P S C R UvX.# 2 1 0©^fe F 
PSCRU^to-KSnS, «tttc, i^nfcf-^XP>yFt^3 7 

JS DTMT'feS Cfcti toTB^ 5 o 



■1 



(27) nU2 0 0 2-5 1 7 0 3 8 

CO 0 6 8 ] 

ifrftm%mmifci$-e&2>frE5fritm'<z> 0 mcs&^tc&sic, commit® 

[0 0 6 9] 

^n/ty- K»*^«*>Ha^*Jpjwf-rSo ^iWJicfc^-ai, f p r e g 

2 0 0 ^^JC^XxAU^X^^S^^C^-r^Tc^OS^i^fflt 
ft^tfiSLT, "7- K»*flSf»*e&S»£\ CMFPREG l^X* 2 0 0 
©^^t^tct^LT^D, 2 5^*5V>T, F 

PR E G l^X*©F^#n-K* hTftfiJte->y h 4 2taot»n5„ 

7° 3 3 OlcmtSo ^Ur7^3 2 0 tCfel^T?- K»#ffi»T?&3 £«r£n 
»fii7fy73 3 OteJUtfo 
[0 0 7 0] 

^^T^(/^ i^^TLfci^&^fu Xf7 73 4 0T»Mtil-r 
o L^L7- FtfcfcHf D^Ofe^tl ^*§£\ ^aati X -r >y f 3 3 2 y&h^wfe 

S^fsSttttSnSo ^"©^ 7T773 3 4(C|5^Ty-hW2 0tT^'J 
^>h$n, 7X77^3 3 6lc:|3^TU^X##*§tf 1 TSfcM U * > h $n 
tutc^fccbdtc, fgfia^Otl^^ k77^ttl^200f-^7n«y 



(28) #^2 00 2-5 1 7 03 8 

[0 0 7 1] 

ft£, 7f7^3 4 0T ; «iittB5o 
[0 0 7 2] 

Xf^3 1 Otc&^T, ^^ffijg^T^^V^fiJKff^nfc^, MS 
teXf-V?3 5 OtcB^St;y--Fi!(^-tfnJ:D WJ8ff-r?>o 

X-r>y^3 5 2lcMA,r\ t^T^2nft«cDl^x£#^\ 
^Jgx-^^^tM-To fLTXT7^3 5 4^4o^TV-F^ 1 ftW 
^'J^>hU Xf7^3 5 O^fcl^Tl^X^^VF^ 1 ftW^U^V 

-FWf^£-tfP<fcD*^fr£ -5 A^iJKFf-T^o *#^*§£\ »7-F^-tfntc 
^§lT»iDIto -tfmc&oft£>X7^y:/3 6 0 TMI^IStf tB£o 
[0 0 7 3] 

- * CDft&ttmfe-? 2> fcSbK&mte 7 * - V >y h fjf $6 £> * ^ U * ft « * ^ 

U^P>D-F-r§^-e^§i:^-r§o cco^tttCcfcD, U^X£i*P§cD'if 

[0 0 7 4] 

C073&te£ft> 3- Fl^TBiJcD^^ ^oTU^X^rt^O'lt^^rn- F L 

fc o mi l tz o -r & ^'it^m-r § 0 u * i^^o-iffg^ a - f l ft d x f 



(29) ft«2 0 0 2-5 1 7 0 3 8 

[0 0 7 5] 
[0 0 7 6] 

VF P v 1 ttARMrn-t7^^a-/Hcffflt^3^n-t7^i:ltiI 

-r^^oKmrn^nrcnm^m^yxj-h (fps) <dt-*t-^-¥t&3o 

^l^iAt?Ci:^T#§o &%W*V7 *T*&r>TmmcDM?t'P l EEE 
7 5 4 5l!tt«t§i:i:A^t5 0 C<D{±mWte^-W iLT$5&rJV7 h 

^i/^f-h^ot^ i e e e 7 5 4 Km'&*mj&-?z>& 5 mmznr 

[0 0 7 7] 

z t igm&T- $ m (D^m & > v - x ^ 5 > f 3 y n -t? >y -9- x ^- x -e®jf? -r 

§ 2 -cxD^^-eji^ £ n § o 

[0 0 7 8] 

VFPv 1 7-+f^f t©#f I^T©i:feDo 

• F>>x7(Cfcl>T^- hn- F^fot I E E E 7 5 4 £^±&U%i 

LTT FbXli^nTtg 

• 1 6l«gbi>X^ 0 M^V-X^7yF$/c««^X^i: 
bt7 FUX^nTtg 0 ({gffiJtl/v ? X^MM#ISJ^U'> ;? X^i:fi^§) 



(30) 2002-517038 

• 8m<Dffimmmmui?**frz%:%4'D<D^>tr, 4 Sterns u 

h^rtu^O I E E E 7 5 4 S&tt&SWiflSaHf n * V Tt&tlO^'ftifr 

• i e e e 7 5 4S&i£©&5j|gjii£ai?\ ^^c^^y^nTo&^n 

• FFTOSIZ^HotC, C+ + 1S&T$} a v affltCf?Kj/jN»^e> 

[0 0 7 9] 

f«#fc3u ^^M-K^xTTVFPv 1 ^HitSC^fctfSU /v-F 
^xT^^-hn-F^rffi^^TfiJffl-r^CiitT^^o VFPv 1 te^tc 

[0 0 8 0] 

2. Sffifflfg 

Automatic exception (@W : WWDl'PfcS. CtUZZn^tKOm 

[0 0 8 1] 
[0 0 8 2] 

Bounce 0S^>X) : sd-^U— -r-r ^^X^AtCfg^^nfcWt 5 ^ 
oT, ^--^ h ^ >y ^/ n ^ F ^ m "T c ^ < § ^ « rL- n - F OIE^ 

[0 0 8 3] 

CDP:' Coprocessor Data Processing FPS(^»j/Mfc;£^X^A)©*§-£\ C D P 



(31) ?#g2 002-517038 

[0 0 8 4] 

ConvertToUnsignedlnteger(Fm) :FmF*9©rtg£ra^ft L 3 2 t: >y ME^WiHc^ 

jft-rsc ^ 0 *ejny\ 3 2 try h^#ftbfiffe©ttHn©s»>JMR^iio«»^L 

46fe«ktf«tiftV>©fea6<05ti«>*— KJCflc#*TSo I N V A L I D Wl^tiSlft/JMR 
[0 0 8 5] 

ConverToSignedInteger(Fm):FmOF*3§^:?g : ^3 2 If <y hgE^fitC^SIt* S C 

to 3 2 try h^#{«»oieHn©f?i&/j^^ffios#i5tL«)fe«fc^ 
ai/^fei&oAfet— f^^#-t§o i n v a l i D®mte&W)'hmj&xtimtp 

[0 0 8 6] 

ConvertUnsignedIntToSingle/Double(Rd): 3 2 tfy F^^LISli: LTM 

mznzkmisvxz (Rd) ©rt$*^»fis^tt«iiijs?¥i&/i^8s^«K:s»-r 

[0 0 8 7] 

ConvertSignedIntToSingle/Double(Rd): 3 2 l£y b?^§& Lfiftffli: LT8?S! 
SnSARMl/^ (Rd) Ort$*#»jS*fe(4««|jS»S&'JMBuSltk:aE» 

ncto gaHaarr? inexac t^w 

[0 0 8 8] 

Enin Emin 

Denornialized value(^IEMb^nrdfi):(-2 <x<2 )<DlBffl©fil^"r 

mitrntefflRw-euT*, ymev m* 1 o i e e e 7 5 4- 1 

[0 0 8 9] 

Disabled exception(^ik£ nfcflJflO :FPCSRrt©IIfia^30IJ^*— V 



(32) 002-517038 

KttLX, I E E E 7 5 4 f±^^M^tl^^#IEH/^m^ebTl/^o 0»W 

h 3- F-\/^ >X LT I E E E 7 5 4 1?£ig£*l 
/c*SJH£r£&m-ro ^J^(i:a-fl^A> F^^iig^ti^^o 
[0 0 9 0] 

Enabled exception(a^£tlfc#IJ*10 Z>®m-( ^-^^v btfWzWL 
%>o mn£kft%:5£$L-?%ffimzy-*°- h-3- F^^>XLt I E E E 7 5 4 T' 
[0 0 9 1] 

Exponent mm •.m^rcWLWimm^.t^tcisbKZ^^t^ 
[0 0 9 2] 

Fraction (/J* iBt^nS 2 }I/M^O£Wc&;§>signif icand(^ji^) 
[0 0 9 3] 

Emln Emtn 

Flush-To-Zero Mode: C(D^~ b*Vit, iUbfcmc (-2 <x<2 )©IBHtC 
[0 0 9 4] 

High(Fn/Fm):^^UrtT^^nfdg5WSfiiO±{5 3 2 £>y h [63:32] 
[0 0 9 5] 

IEEE754-1985:"IEEE Standard for Binary Floating-Point Arithmetic", ANS 
I/IEEE Std 754-1985, The Institute of Electrical and Electronics Enginee 
rs, Inc. New York, New York, 10017. U* UiflEEE 7 5 4##tif[i*tl§^ 



(33) #^2 00 2-5 1 7 03 8 

[0 0 9 6] 

Infinity: IEEE 7 5 4 CD#$fe:7 *-Vy h^M^oo^g^ 
Wsignif icand (MW&m tf^T-tfn&CDT'g^fcfcSo 
[0 0 9 7] 

Input exception: 13- Z.Z>nrzWm<Drcib<D 1 "DUfclZmSKDjr^'yFtf''*— 
[0 0 9 8] 

I ntermed i a te resu 1 1 HJg^) : ^1,46 £ itu tc tUMSS^^rtft-r § 46 ffl 1/ ^ 
rtSP7*-Vy ho C©?*- V«y F lUSMj^y *— V y F «k K> t^f^M7 
-C F £ s i gn i f i cand (W$J«0 7 -f F % *f"f S C £ t & S 0 
[0 0 9 9] 

Low(Fn/Fm) : * * 'J rtT*^£nfcte»ftfli©~Rfc 3 2 If >y F [31 :0] 
[0 10 0] 

MCR:" A RMU^X^^Sn^nty^l" 0 F P S ©*§£\ HtlKlZA 
RMU^XZtF P S l/^X^t^tf-^f/ciWU^X^^t^f 
^t$nSo 1 OOMCR^7^^fot3 2 If >y F ©1i$B©#£r'ffcj£'?" 

[0101] 

MRC :" n^y^SARMl/^^i". FPSOi^, ontC 
^tft&o 1 OOMRC^7X^j$oT3 2 If v F ©fit $6©#£:fsiM-r § C 
[0 10 2] 

N a N :«F?-ett*< , J?I&/J^^7*-*T»y F^n- KftStttelE^fl^if* 
o N a NlCfct 2O0^^m§ : OTfe«fe^M^*/fct4»±o fH^N aNS 
, ^^^FilLT^ffl-r^^Invalid Operand 0ft8&:fr'^ > F)0!W£:£ U 

§0 NaN©?*— httM7^-;PFWt 1 ©-tf nT&l^signif icand( 



(34) ^2 00 2-5 1 7 03 8 

[0 10 3] 

Reserved (^^^) : 7^ — ;l/ Fj&V V^'J^ >f— S' a <fc oT^ttStl 
"^Sf*" ?K 7 — ;l/ FOFtygtfHf n^l^ £ UNPREDICTABLE (^$"J^rJ 

[0 10 4] 

Rounding ModeGtltf)*— K) : I E E E 7 5 4i±ffilZ, ~$-^T<DstWtfMM<D 
*)*tWbftttn(drft^ftV»o d©flS*<te^©»ST^-rtCt±, significand 

btf urcr < £<> i e e e 7 5 4»& 4 -dcdm 

t-Ffcjg^bTVS : 0^HAt§ (RN) h\ ■£nfc:*L«>f:fcfcJ:tfje>» 
Tl. (RZ) ^6-F, IEcDW^c^l463 (RP) F\ ZLXn.commtt 
M% (RM) KT?*So «^©^~F«^^T^46, 
significand(Wa^)©STffilfy hjb^oK:38:SOT?&nar«It)±tf Ttt^= 
* rfflffej ic-TSo ZSa^^-FiisigniflcandCW^^Ofefflfl^Vf «y h£r 
HMW^m^D^T^o i:tiaiM^*5^tC, C++, m'Java 

[0 10 5] 

Significand(W«^) : 2 Ji88ift/|MBU^»03>^-*> FT*&9> ^©Bf 

& 3 f — ;l/ F t frb & 3 o 
[0 10 6] 

Support Code : I E E E 7 5 4fi£f&fc©5&tt«r#*.3;te 



(35) m%2 002-517038 

Wtm^-t^^X^^o & ? 1 F5tD^T\ I E E E 7 5 4 tC 

^jBKH*fe*4 1 Oit-r— ^ ^^^)(Dl^] *»^X5 a V- F L&tttU;? 

o 

[0 10 7] 

T r a p(h77^).: fWtl(0»^-7;Hf7 F^F P S CRt-brh 
[0 10 8] 

UNDEFINED (*$£«) : *^i^h7 7^4?^§^©c tTfe§„ A 
RM$MfC|W&f¥LVv|i$glco^TteARM Architectural Reference Manuals 

[0 10 9] 

UNPREDICTABLE (MJ^nJti) HIITeff ^©!l6*Sfeti*JWU^ 
[0 110] 

Unsupported Data (+r#— F •gtlT^ft^T— : a-^xT^MIS 

^^F^xrT^±^^fc«gp^tc-9-4?-F-r§*\ *sv^4i«jwaai*^ 



(36) 1#g2 0 0 2-5 1 7 0 3 8 

[0 111] 

3. 1x^X^*7 7 

3 . 1 i^iJi 

[0 112] 

igm&T*— 5 s\W%&&£Sl 0 £S1 1 <DFW&±«#-r£ 0 nvW^ 

^r^tr^y/v nrnfuv^iz. WT.^mr^^-ot^ yf) ^yf— 

[0 113] 

VF P v l tiX;*^*— FSfcti'** h- FT cti t><Dlsi?X$lcT?-te 
XT*#S<fcaicLTI/^o X^^r-FtC&^Tfci: 1 2 Of; fete 3 
5>FU^X^^oT^Cfc^^e^U^X^tC##)A*nS„ £/c-^ 
h ;l/^r- F fc: *3 V ^ T «\ mfe £ Hfc y F # lx^X * (D ^^r#^-r § 

*tLT, ^fdg5|ije^^>FcDit^«^4-3(D^^^Lr-<^ h ^MH^tr 



(37) 1fg2 0 0 2-5 1 7 03 8 

* 1 LENtV ha— K-fb 



LEN 




000 


ij ^7 


001 


h /i^S 2 


010 


s<<? h/HS 3 


on 




100 


+<0 5 


101 


bA-g 6 


no 


hA"& 7 


111 


bA-S 8 



[0 114] 

^ F;V^— F*ttHfnJMft©ffi*L E N7^-;V F&c»#jMyc £&c£oTffF 
nl^n^o L E N7>f- /l/F&C0#AoT^S£> F P S teXTJ^- FTKlfP 

^X#£fc&l 6<BO^?Sfilx^X^*7'FUX}i^'r5Ci:i:»PiE^nSo L 
EN7^-;V F#4fn-£&l/^§-S\ F P S \t<^Z FA/*— F^lfrfFU U^X* 

3- Flcoi^Ttta 1 *&ffl<D£ £ 0 
[0 115] 

W7s%>%y>7 (S0-S7^ft(iD0-D3) &C&£&£>^ F;W8- F&C& 
[0 116] 

3.2 mmftu^x^o^ffl^a 

F P S C RF^CDL E N7^- ;l/ F#0 3 2{@<W¥f? J^Uv'X* S 0- 

S3 i*M£fflT-#5o cneuyx^fov^n-efe^Tt^fctite^u^x^ 



(38) ftS 2002-517038 



31 31 31 31 0 



SO 


S8 


S16 


S24 


SI 


S9 


S17 


S25 


S2 


S10 


S18 


S26 


S3 


Sll 


S19 


S27 


S4 


S12 


S20 


S28 


S5 


S13 


S21 


S29 


S6 


S14 


S22 


S30 


S7 


S15 


S23 


S31 



[0 117] 

F P S CRrt<DLEN7^-;VK^0J:H^l/^ Uv y X^77-Y;Hi 
0 2 <fc -5 Zti^titf 8 ffitDlfH U^x ^ £ * § 4 ? £ LT^ 

KrTSo ^ b>lUisXZ<Dm 1 /^^V0-V7liX^7^>*X^ S 0-S 7 



(39) 



ftig. 2002-517038 



\ 


r 






Sl/Vl 




Si/Vi 




S3/V3 




S4/V4 




S5/V5 




S6/V6 




si/vi 






r 



"S57VT 
~597v9~ 



S10/V10 



Sll/Vll 
S12/V12 



S13/VU 
S14/V14 







S16/V16 












S19/V19 




S20/V2U 




Sli/Vii 




S22/V22 




S23/V23 




1 







- 


S24/V24 




SiS/Vir 




S26/V26 




S27/V27 








S29/V29 

















: !I2 igifiisv^^ 

[0 118] 

tLF P S CRrtOL EN#3fc:-trv hSttTVntf, ^hMl 
0*#B8-rsi:U^X^S 10, S 1 K SI 2*3J;tfS 1 3#"S* MWSJffc: 
IboK^o IrISHC, V 2 2^#It5t S 2 2, S2 3, SI 6 43<ktfSl 

fc, V7{:i<I/^^aV0t'$S„ [5l«^V8tiV 1 5<D^cM#, VI 6 
&V2 3tcM#, fLTV2 4aV3nci<o 
[0 119] 

3 . 3 ism m z <D&mm 

FPSCR^LEN7^f — ;l/ Fsb* 0 ©Ji^ 1 6 fflCHgagX*7 l^X # 



(40) #^2 00 2-517 03 8 



63 063 0 



DO 


D8 


Dl 


D9 


D2 


D10 


D3 


Dll 


D4 


D12 


D5 


D13 


D6 


D14 


D7 


D15 



mm 3 igmmisisx?-**?' 



[0 12 0] 

F P S C R P^CO L E N7^-;VF^0J:D t>7v#l/^§£\ 4^@©X^^L^v ? X 



1 


1 


i 


1 


bo/V6 






D4/V4 






D8/V8 






D12/V12 




£>i/Vi 






















'Di/Vi 


















D14/V14 




&3/V3 






D1/V1 






Dll/Vll 
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•£?isB>o (Dmm&vm t^rnic i%m& u^x^t^oo/^^rt -crass 



(41) 0 0 2-5 1 7 03 8 

[0 1 2 1] 

3. 4 isvx#mmfe 

[0 12 2] 

^U^X^SO-S 7 i: It, f /-c^fiHOi^I/^X ^ D 0 - D 3 b t 

• ScalarD = 0P 2 ScalarA or ScalarD = ScalarA OPa ScalarB or ScalarD = 
ScalarA * ScalarB + ScalarD 

VectorD = OP2 ScalarA or VectorD = ScalarA OPs VectorB or VectorD = 

ScalarA * VectorB + VectorD 

• VectorD = OP2 VectorA or VectorD = VectorA 0P 3 VectorB or VectorD = 
VectorA * VectorB + VectorD 

[0 12 3] 

3. 4. 1 xti^mn 

[0 12 4] 

1 ? F P S C Rl^DLEN^-^F^Oo ^M9t^5 ^XS&^^J ^ 
[0 12 5] 

^tem<DUi/*$(Dl^nV%> <fcV\, CO-t— Hfi F P S C RI*J(DLEN"7^— ^ 
[0 12 6] 

3.4.2 ^ Ml/fra^^o, MMSiMjt^ofc^l? 



(42) 2002-517038 

LOT-Ft«i« F P S CRrtOL EN7Y-;l/F*HfnJ;Dfe:rt;£< 

MtlX ts^V 2teUi?X$yr^ >UDmW<D/ *\y>7 l^X # T» fc 

SDOU^^tt^ftlfeVe c t o r Bte^ffl*TSC ^ 
tEX^I^X^V e c t o r B©^y^-t$5^§V^V e c t o r D 
#LENgi£«fc9JgVvg£TV e c t o r B [C^r—s^ y ^Lt^ii^ ^O^P 
i&«^iH^We*5o VectorDtVector B&IrIC^ h 

•fe * ^ 3 > 0 U -r-y;l^^#Mo 

[0 12 7] 

3.4.3 h^x-^O^^o/cM^ 

cot— F*POi»3it4. FPSCRrt©L EN7>r- ;l/F#Hfnj;9£>;*:^< 

nSo Vector F;KD<B*<Dg^fi V e c t o r B tC*3^a*fJES*rS 

SHilll-a-^-iienTVe c t o r DlC^lf&asftSo l^X $ 7 T -OKDJEMKJ 
<D'*>#ftK%:\,^lsi?7>#&\,^M V e c t o r Ajb^fflTt, t^t©^^ 
hMVe c t o r Bte^ffl*Z?#5 0 2Si©i^i:llS{i:, n^jt^ hfrt 

ffi&fc*? hj]/o^nfrtfLENmm&o&m^&zv*-s^>yy°?%m& 

Mo 

[0 1 2 8] 

3.4.4 SWU-VUf-^I/ 



(43) 



1$S2 0 0 2-5 1 7 0 3 8 



LEN 


Reg 


Reg 


Reg 




0 


HtlXi> 




¥tixh 


S = S op S or S = S ♦ S + 
S 


OXt&^ 


0-7 


t'tixh 


¥tixt> 


S = SopSor S = S * S + 
S 


oxt&^ 


8-31 


0-7 


¥tlXt> 


V = SopVorV = S*V 
+ V 




8-31 


8-31 


¥tixi> 


V = VopVorV = V*V 
+ V 



M3 ¥«^2t^yK^^Kfflft 



LEN 


Reg 


Reg 




0 


HftXt> 




S = op S 




0-7 


Htixh 


S = op S 




8-31 


0-7 


V = opS 




8-31 


8-31 


V = op V 
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LEN 


Reg 


Reg 


Reg 


mm* 4? 


0 


ZftXh 




HtlXh 


S = SopSor S = S * S + 

s 




0-3 




t'tixh 


S = SopSor S = S * S + 
S 


OT'&V^ 


4-15 


0-3 


k'tixh 


V = SopVorV*=S*V 
+ V 




4-15 


4-15 


Hfixh 


V = VopVorV = V*V 

+ v 



i 

5 i^mm2^^<y>v^^mmm 



LEN 


Reg 


Reg 


mm**-? 


0 


h'tixh 




S = op S 




0-3 




S = op S 




4-15 


0-3 


V = opS 




4-15 


4-15 


V = op V 



[0 12 9] 

4 . rfjH*-fe V h 
F P S#^« 3O0i3fJ'J ^^SijT^^o 

• MCRfcMRC : ARMiF P S t(Dm<DmM 

• LDC^STC : FP S t^^EV M©n-K^<kD*Xh7 

• cDP:f- 

[0 13 0] 

4 . 1 tfHfrEWJtt 



(45) &m2 0 0 2-5 17 0 3 8 

f p s r^*T?rv{tm<DMmK(F2 -do) u^frtf&Zo ^^T^^ymm 

[0131] 

4 . 2 ^<Dmwt 

FPSm ARM^ftHtf^-r^TO^^T-r§^T\ iJfc^tl^ 
FPSlC&tf ^ji^bfr^fci: : 

• F MO V X :^Sj/h^^X^AU^X^^^LTMiAt?^fcMiAt? 
CO 1 3 2] 

•T^^-e^lfr^n^o System ID Register (FPSID) lctt~? ^FMOVXtefttrT 
Ift/ht!(^^^J:^T^trc^jn^J:^TP^^n§o User Status and Cont 
rol Register (F P S C R)^MbT(FM0VX^oT)OMiA^/^M/«iA^^ 
ffoT^m^yb (FPSCR [4 : 0])£* U TTZ £ £WV%Z> 0 

[0 13 3] 
4.3 mfcf-Z1t®.5&& 
&M'hm&tm®. : r-2 t<Dm<D^m&, F P SK::fc^T2X-r>y:/:7 0 p-fex 

v&Zo ?&fr%mm.7 i -$*fa5T : -2$mifc^£&m*ft5 c d p^^p> 

[0 13 4] 

4.3.1 f p s \s*j^$(DmmT-zfrhmh'bWL&?—z^(vm& 

SESfr— MCR FMO V S^fotA RMl/^^A^»l! 

^miijgui/x^^\o-F-r^c^^-e#§o fbtFPs v*j7,znomwL : r 



(46) ft& 2002-517038 

&mcm&-&tixffi£9c f p s ve?x ^ic«#iAi ns,' SM^Mft^Jf -a- 

L3 2 tf >y hS^-T^C t^T^§ 0 
[0 13 5] 

4.3.2 F P S U^^©»»/jNSfe^7*-^*>eSE»'r-^^©aSI 

nfcS^fi^-^ffiigu^x^tcAti^n^o iif-^{iMRc fmov 

[0 13 6] 

4.4 k>*x^77^KD7 Fl'XKJc^r^-feX 

;l/KF*9<D5 tT >y b^? 0 ±<ft 4 fcT>y hit F ru F mSfcti F d t 

[0 13 7] 

iglSX^-X(S= 1 )P«3-el&f^"r5^tt^^> FT Kl/X©±ffi4 £>y h 
©*?:ffflt5o cn^4lf7MiFn > F m*5 J; tfF d 7^— ;l/ K ICS'S n 3 
o N> M N *5<fctfDlfv MliS-rS^^^F^Y— ;l/FtC^^>FT F 

[0 13 8] 

4.5 MCR (ARMI/^X^^P.nyn-b'yt^i) 

MC RimiA RMl/^^rt©f-^^F P SJCfco T(SSaS* /"ctifieffl-TS 



(47) 0 0 2-5 1 7 03 8 



31 28 27 24 23 21 20 19 16 IS 12 II 8 7 6 3 4 3 0 


COND 1110 


Opeo0« 0 Fn Rd 101SNRR1 ^H$$r<£- 


m& MC R My Y 7 4 — JV ycDfem 


My h 7 4 - 

K 




Opcode 


3 tfy h&n^- K(*7#flH) 


Rd 




S 


0 : ^«ftf^5i^ K 

1 : 


N 


l&f&fby'i? X*<DMTtiL My h 


Fn 


OOOO-FPID (ayPtytlDSf) 
000 1-FPSCR (^ — D 1 ^^ — ^^*5 it^faj 
^Vv?**) 

0 100-FPREG (l'i'^^77'f^^l' 
^**> 


R 


My h 



(48) ^2 0 0 2-5 1 7 0 3 8 



m7 MC Rm&zi— Y-7 4—)V Kit Hi 



Opcode 






0 0 0 


F MO V S 


F=Rd (3 2 If y h» a/otyfl 0) 


0 0 0 


FMOVL D 


Low(Fn)=Rd(fflF*»#T*fc3 2 1fyh, 3 
/ntytl 1 ) 


0 0 1 


FMOVHD 


High(Fn)=Rd(te«faf_hflr3 2 tf y h , 
a/ntyfl 1) 


0 10-1 

1 0 






111 


F MO V X 


System Reg=Rd (=» :/p -fe y If 1 0^.^ 
-*> 



[0 13 9] 

a : 3 2 e «y hf- ^IIO^FMO V [S, HD, LD] fJM^CfcoTU-tf 
-h^nt^So ARMl/^^l fcWffijg Ix^X * (Dt^ * F MO V 

T SRU FMOVLDtFMOVH Dt&W^n** nT¥ftt±¥ftlt&M-? % 



o 

[0 14 0] 

4.6 MRC (3/Pty^/Jt«W^l/^^^^ARMl/^X^ 
MRCl&miF P S ^X^Of-^^A RMl/^X^^t^o CM*. 
, &5W4{gfa®F P S U^X*£2 0<DA RMU^X^^iU CPSR© 



31 21 


27 24 


23 21 


20 


19 16 


15 12 


it t 


7 


6 


5 


4 


3 0 


COND 


I I 1 0 


O pcod« 


I 


Fn 


Rd 


\ 0 1 s 


N 


R 


M 


I 





|§|#ij6 MRC#^7f — 'J's/F 
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^8 MRClf?h7^ — A, 





mm 






Opcode 


3fc*y hFPS»*3-K(*9#I) 


Rd 


ARMlEilTc* VS?** 3 w- p 


S 


1 . /^V jfcsk pte A A o v . i f 


N 




M 




Fn 


uut/u r r I U \^ s & \Z y y \ LJ w J 

0001-FPS C R (^ — -if^T—fXioX-tfffl 
0 100-FPREG (^^^^77^^^!?^ 
{&<D U i? X ? <D * — Kfi^^^^-Cfc?) N 


Fm 


flies' h 


R 





*FMOVX FPSCR^Ci^ LR d 7-f^V KCR 1 5 



(1 1 ll)^AoTV^5*?), CPS R£>_h{44 E> hfi#?>nfc^# 



(50) 



*9 MRCiIa-K7^^vK£i 



Opcode 

7 4 —)V K 






OOO 


FMO V S 


Rd=Fn ( 3 2 If y h % a^PtyflO) 


0 0 0 


F MO V L D 


Rd=Low(Fn)Dn<DTtt32 Ify Kd5|5j££ 

ft5 0 (ffif«KTt3 2lfy K 
-fe y 1 1 ) 


0 0 1 


FMO VHD 


Rd=High(Fn)Dn£)_h{£ 3 2 tf y ^$5j£ 
(ffiF»*±<fc3 2 tfy K ^7° 
ntj/fl 1 ) 


0 10-1 

1 0 






111 


FMOVX 


Rd=System Reg 



&:MCR FMOV^©ai2?r#I. 



[0 14 1] 

4.7 LCD/STC(n- P S U^^) 

LDC^S T Cf&ft&F P S fc**U tOSTf-^^tS. *?l&'JMRj& 

7 s - ^ /*c MifccDx- £ fe^C «t o T V ^ tl<D)fit JgT *ga£t? # § o Witt 
hSnt^S, LDCtS TCO^S-fi^-^^aVO^^Ol/^T^^l 1 £r#M 

o 

LDCtST C#^0-7*— T>y h JfcHiW 7 tejSVTo 



3) 28 


27 25 24 23 22 21 20 19 16 15 12 II 8 


7 0 


| COND 


1 1 0 


p 


V 


D 


w 


L 


Rn 


Fd 


1 0 1 S 





LDC/STC^^7d—* vyh 



(51) 



0 0 2-5 17 038 



—/V h 


mm 


P 




U 


Jb/ T If y h ( 0 = T, 1 =±) 


D 




W 


h /< y ? ) 


L 


^ffpjfcfy h(0=.X p7, l=a— K) 


Rn 


ARM-^ — 3 — h j 


Fd 


: 

fc:^^ /fi£4£4k U-l/ X >$r T" Kl/X 


S 


0 : K 

1 : iftS^^K 




FLDM (IA/DB) *5 <fc T>'FSTM (TA/DB) {d*J- U "C ftsj^-f ^ # ^ 
fftl8 If y h*7^y h&ttt1t%.1fem\'i?X#& ( 

fi&m^ i/^^^tsc^ 2^) „ i m<nmm<nMi±<ny- 



[0 14 2] 

4.7.1 D-F^xhr»f^fcBB*rs-««fii«jS 

[0 14 3] 



(52) 0 0 2-5 1 7 0 3 8 

^X^77-<;l/7^-77 h#^<Di|iJg<D I E E E 7 5 4 y*— V<y h 



(53) 



m 2002-517038 



w 



y 



FLDM<condXS/D>Rn. < U 'J * # ]} 7> 
FSTM<condXS/D>Rn. < V *J * # V X 



u — K/* h 
T t A* 7"? 



-8<0 O y h7>f-/V Km fit 3 2 tf y MEjg&^SctfSAo-CV^ 



it h ]) y 2 



m : 

FLDMEQSrl2, {f8-fll} ; r 1 2 (OT FU^fah 4 US <D 4^<D 
fpl^^^^s8, s9 x s 10^n-Kt5. r 1 2li^^. 
FSTMEQDr4, {f0} ; d 0 1 <73fgr)fif g £ r 4 <7)T K U^^^ h T 
r4lM„ 



7° 1 feig : K/X hT-TSo Rn<Z>/tf* W 
ir7^ h y ^ h V o 



y ? * 



0 



1 



y" 



FLDM<cond>IA<S/D>Rn!. <US?;* y" V 
FSTM<cond>IA<S/D>Rn!. ^v^y* D 



n— K/* h 



RnKi&ttS&IgT - K/* hTL, ft 

|©feS©§l^©r KV*SrRn^5>r h^y ?o t7tyF7^ 
— /U KfcWt 3 2 tfy h$Ei£<£>!&aSAo Rit-O y << V > ? 

ISt7-fe7 h * 4 0 a— K^^^://Md;fcV^T^a££;ft,fc&*7 — 
K&tel 6 0 UtTy 1 (--fe y h L # ^ft # <b & V> 0 :mtt£<£> 

^ y^v K-rs-tfem, ^fc^^/u^tbf^misv^Tig 



« : 

FLDMEQIASrl3 ! , {fl2-fl5} ;r 1 307Kl/^*7 4 ffil )ft f8 J* S: 4 
@Of pl/i^X^s 1 2, si 3, s 1 4 , s 1 5^P-Kt5. 
r 1 3 mk<D1r — ? *?n1rT K U ^ •Cjgjf-t'S, 



7* y * * fe 9 , 5^h^y^4U 



(54) 



®m 2002-517038 



FLD<condXS/D>[Rn. #+/-offset], F 
d 

FST<condXS/D> [Rn. »+/-of f set] , F 
d 



y "fe y h 
9 d •— K/ 
* FT 



1 us^^ ^ Sr 

*> 9 , Rn 



(U=l) *»-€rn*»foabll< (U=0) :i(U97 



FSTEQDf4, [r8, #+8] ; 3 2 (8*4) ht7t y h Lft r 8© 



1 


1 




FLDM<cond>DB<S/D>Rn!. <Vi?^$ V 


y° v ? & v 








X h> 


* > b # *> 








FSTM<cond>DB<S/D>Rn ! . < V 'J X # V 












T /V f- 7° 













Mtttf-^i? y h * 4 



— K/^h7t5. Rn©7KV7^7yT^y> 

-Cfof?, Rnri><bML3l< (U=0) o Kfi 
^7 h7t5^\ sfi©#MH** y*a»6>»— K 



09 : 

FSTMEQDBSr9 ! , {f27- 
r 9 fcAoTV^. 



f29}; s2 7, s2 8, s 2 9 <b 3 f@ 



[0 14 4] 

4.7.2 LDC/S TCij{W7iJ 
m.1 2(iLDC/S TC^n-FlCfettSP, W, U If >y h Of^^tl^I 



&m 2002-517038 



m\2 LDC/STCifft-7!) 



p 
r 




1 1 
U 


*q y t / p y 

< / ^ P 


TP 


0 


0 


0 






u 


U 


i 
i 


*y >y ^> /v/ y 


1 LL/IVI/ r v> I III 


0 


l 


0 






0 


l 


I 


w ^ ;* * # ? 


FLDMTA/FStMiA 


1 


0 


0 




FLD/FST 


1 


0 


1 


a-:7ir y F 


FLD/FST 


1 


1 


0 


U is X 9 # ? 
h 


FLDMDB/FSTMDB 


1 


1 


1 







[0 14 5] 

4.8 C D PCnyn-tryHt-r-^M) 

iflii^tt?. fmac (mm-mwtf^fttfi rcmn-e 

V&% 0 CCDWinte, I E E E^i6^^3#icD^-^^y F^AP^^M^m^ 
^ttTtft>n^^T\ Bi&l^fliJPII¥^ll£teS&3o cmc£K>, J a van— H 

« f ma cmflb&fijffl-rs c «fc mnvmcijantzmnz o *>mmnm 

[0 14 6] 

C D P 2 ~D<D4fr<frl* F P S b'^X^rtcD^Sj/J^^ji^r^cD^ 

ffl^^-r§±TlStCi£0 0 FFTOUI [S/D]fcfc, FPSCRrtO^tf^L 

ti&Lmm^mirZo fftosi es/d] ^w^^o^^fa 0 

FFTOUIZ [S/D] tFFTOSIZ [S/D] & CM^T? m 
ifflFPSCR^-FWlClt, «e^h^DitT§o FFTOS 
IZ [S/D]©«ffi*i»»/^a^^6.S0RK:Sgift , rsi6K:c, C++*5<fctfJ 



(56) *$g2 0 0 2-5 17 03 8 

a v a?S^Sn5o F F TO S I Z [S/D] rfH^&F P S C RA^RZST? 

fctb<DV^^JVi3^> h^F F TO S I Z [S/D] »*©"9->f ^lo&tf^ He 

f-ei^b, 4-6-9- -r^;i/gp,^-r§o 

[0 14 7] 

MiKiCDP CMP^, ^tUcM< MR C FMOVX FPSCR 
1*M**teoTfTV\ mZftfcF P S-7^ytT>y h(F P S C R [31 :28])* A R M 

CPS R77^tf7 Men- K-rs 0 J**MW*fcfc, ttM*^7> F<D 1 Otf N 
aNTfeS^ I NVAL I D (^)^MOnTtltt^fe^ C fc&^C t 
5o F C M P i: F C M POtt, Mt^7 ^ F© 1 0*<N a N t^Sl^ IN 
VAL I D(M^)^£-£%</^\ FCMPE^FCMPE 0 teWm^^* 
o FCMPOfcFCMPE 0 F m7^ Fl^©t^7 > F t 0 fc^Mb, 
WKCl/c^btF P S77^-fr«y ht5o ARM77^N, Z, C, V 

& FMOVX FPSC Ri$i5(Dmc&n : <D£vicmm-£nz>o 

N : ^D^M^ 
Z : ^L<^ 

C : J; 9 \ *fe«4*H/\ £fcf»?&L 

V : «*ftL 

C D P #<fr£D!7 * -V >y h « «J 8 tc^f o 



31 21 


27 






24 


23 


22 


21 


20 


19 16 


M 12 


11 






8 


1 


t 


i 


4 


J n 


COND 


1 


1 


1 


0 


Op 


D 


Op 


Op 


Fn 


Fd 


1 


0 


1 


s 


N 


Op 


M 


0 


F in 



Opcodtp _J j I Opc»d«U:1l Opcooo|0) 1 



(57) ^2 0 0 2-5 1 7 0 3 8 



mi 3 CDPIfyh7^ 1 — J\* h'MB 



— >\> K 




Opcode 


4 if sr h f p s mn =» - k (* i 4 &m) 


D. 


Old -fey h L&tf*itf&bftv> 


Fn 


te^TcA^** 4 tf ;/ hSfcl* 

mmTtAUitxpT k * * & « 


Fd 


*&3k9c\'*J*9 4 tr y h 


S 


0 : ^^^--i^^ K 

1 : HSI^^K 


N 




M 




Fm 





[0 14 8] 
4.8.1 

gl 4 C D P - F^fxto t^T«--t-7^n- 

[OPERATION] [COND] [S/D] £ ^ -5 B%$l2>o 



(58) 1#g2 0 0 2-5 1 7 0 3 8 



^14 CDPSI^ Ktfc* 



7 -< — K 






OOOO 


FMAC 


Fd=Fn*Fm+Fd 


0 0 0 1 


FNMAC 


Fd=- (Fn*Fm+Fd) 


0 0 10 


FMSC 


Fd=Fn*Fm-Fd - 


0 0 11 


FNMSC 


Fd=- (Fn*Fm-Fd) 


0 10 0 


FMUL 


Fd=Fn*Fm 


0 10 1 


FNMUL 


Fd=- (Fn*Fm) 


0 110 


FSUB 


Fd^Fn-Fm 


0 111 


FNSUB 


Fd=-(Fn-Fm) 


10 0 0 


FADD 


Fd=Fn+Fm 


10 0 1- 
10 11 






110 0 


FDIV 


Fd=Fn/Fra 


110 1 


FRDIV 


Fd=Fm/Fn 


1110 


FRMD 


Fd=Fn%Fm (Fd-Fn/FmO^ f£$l 5 /hjft) 


1111 




1 5 ^#fi$) 



[0 14 9] 

4. 8. 2 fi£3gigt££ 

gl 5l±iBJ¥=i— F7-f— ;l/KOExtend(ffi§8ffi)*^aiK3Bffr^*^'ro "T^ 
TOffMH* [OPERATION] [COND] [S/D]CDJ£3:^&^ it^Mfcfc £0 F L S C Bi^fr 

^yf'^Xtm (Fn [3:0], N} fclflCfc^tcLTf^&nSo 



(59) 



tt& 2002-517038 



tl5 CDPJfcigift^ 





ZaW 




n n a /a n 
\J u u u V 


UPDV 
rlri 


Fd=Fm 


0 0 0 0 1 


FABS 


Fd=abs (Fm) 


0 0 0 1 0 


FN EG 


Fd=- (Fm) 


0 0 0 1 1 


FSQRT 


Fd=sqrt(Fm) 


0 0 1 0 0- 
0 0 111 






0 10 0 0 


FCMP* 


Flags rFd^Fra 


0 10 0 1 


FCMPE* 


Flags:=FdOFm0W$8^W 0 


0 10 10 


FCMPO* 


Flags :=Fd<»0 


0 10 11 


FCMPEO* 


Flags :=FdOO0!lfl-$»^W V 


0 110 0- 
0 1110 






0 1111 


FCVTD<cond>S* 


Fdifefiimi'i'x? K)=Fm(m«« 
&£ftfcfc<7? 0 (a/ntj/fl 0) 


0 1111 


FCVTS<cond>D* 


Fd(*mS^^^^= - K)=Fm ({#**£ 


1 0 0 0 0 


FUITO* 


Fd=^^-^LSE^^¥/^)»«^^m (F 
m) 


1 0 0 0 1 


FSITO* 


pd=$m- »> <om$czmmmm^£& (f 

m) 


10 0 10- 
10 111 






110 0 0 


FFTOUI* 


Fd=flF"i§-/ c eL&&lC^& (Fm) 
{mfTRMODE} 


110 0 1 


FFTOSIZ* 


Fd=^#L£E&«& (Fm) 


110 10 


FFTOSI* 


Fd=«F**> 9 «E*|C«* (Fm) 
{^.ftRMODE} 


110 11 


FFTOSIZ* 


Fd=W**»9*«Ci:UE* (Fm) 
{RZ*r- K) 


1 1 10 0- 

11111 
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CLAIMS 



[Claim(s)] 

[Claim 1 ] It is a data processor. The register bank which has two or more registers, It has an instruction 
decoder following at least one data-processing instruction which directs the vector operation which carries 
out multiple-times activation of the data-processing operation using the data value of a series of registers in 
said register bank. Said register bank contains at least one register subset. Said a series of registers exist in 
said subset. Said instruction decoder is a data processor characterized by what said a series of registers are 
controlled for so that said a series of registers turn within said register subset. 

[Claim 2] Said vector operation performs said data-processing operation using two or more data values to 
which the register of two or more reams corresponds. The register of said two or more reams exists in each 
subset including the register subset of plurality [ register bank / said ]. Said instruction decoder is equipment 
according to claim 1 characterized by what the register of said two or more reams is controlled for so that 
the register of said two or more reams turns within each register subset. 

[Claim 3] Said two or more subsets are relatively prime equipment according to claim 2 characterized by 
things. 

[Claim 4] Said subset is equipment given in either claim 1 characterized by what it has for one register 
group who consists of registers with which a number was assigned continuously, claim 2 and claim 3. 
[Claim 5] Each of two or more of said subsets is equipment according to claim 2 characterized by what it 
has for one register group who consists of registers with which a number was assigned continuously. 
[Claim 6] Said two or more subsets are equipment according to claim 5 characterized by what it has for two 
or more register groups whom each register group who consists of a register with which a number was 
assigned continuously is following. 

[Claim 7] Said two or more subsets are equipment according to claim 6 characterized by what it has four 
continuing register groups for. 

[Claim 8] It is equipment according to claim 1 to 7 which has further the transfer controller which controls a 
transfer of the data value between memory, and said memory and register in said register bank, and is 
characterized by what said transfer controller transmits a series of data values for following two or more 
move instructions between said memory and a series of registers in said register bank. 
[Claim 9] Each register group is equipment according to claim 6 characterized by what is accessed in the 
address through the incrementer which performs a surroundings lump among the register group's both ends. 
[Claim 10] Equipment according to claim 1 to 9 characterized by what said a series of registers are a series 
of continuous registers. 

[Claim 11] Said register bank and said instruction decoder are equipment according to claim 1 to 10 
characterized by what is been a part of floating-point unit. 

[Claim 12] The step which stores a data value in the register of the plurality of a register bank, At least one 
data-processing instruction which directs vector operation is followed. It is the data-processing approach of 
having the step which carries out multiple-times activation of the data-processing operation using the data 
value of a series of registers in said register bank. As for said register bank, said a series of registers exist in 
said subset including at least one register subset. It is the data-processing approach characterized by what 
said a series of registers turn around within said register subset during said activation. 
[Claim 13] Said vector operation performs said data-processing operation using two or more data values to 
which the register of two or more reams corresponds. The register of said two or more reams exists in each 
subset including the register subset of plurality [ register bank / said ]. It is the data-processing approach 
according to claim 12 characterized by what the register of said two or more reams turns around within said 
register subset during said activation. 
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[Claim 14] It is the data-processing approach according to claim 13 which the data value in a series of 
registers is the tap multiplier of a filter, and is characterized by what the data value in a series of another 
registers is a signal value by which a filter is carried out with said filter. 
- -[Glaim-15]^e data-processing approa^ or more vector 

operations are performed for to the data value in said two or more Muraji's register by changing the start 
point of a series of registers for every vector operation at least. 



[Translation done.] 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

This invention relates to the field of data processing. This invention relates to the data processing system 

which has a register bank and supports vector operation in more detail. 

[0002] 

Offering the data processing system which has a register bank and supports vector operation is known. As 

an example of such a system, there are clay 1 and a multi-tie tongue processor of DEC. 

[0003] 

Clay 1 processor has a vector register bank and the Scala register bank separately. When the operation code 
of the instruction currently executed shows vector operation, a series of data values are returned from a 
vector register bank according to the mask stored in the die-length value and mask register which are stored 
in the die-length register. Specifying how many data values die length has in the data value of a series of, a 
mask specifies which data value is returned out of two or more data values corresponding to the vector 
register shown in the instruction. 
[0004] 

A multi-tie tongue processor has one register bank, and the register in it is used as Scala or a vector, the 
register specified as the instruction itself — Scala — or the flag which shows a vector, and the die-length field 
which shows the number of the data values in said a series of data values when a vector register is used are 
included. 
[0005] 

The vector instruction itself is desirable. It is because two or more data-processing operations by single 
instruction can be specified, so code density can be raised. Digital signal processing, such as an audio and 
graphics operation, is suitable for especially using vector operation. Activation of the filter operation which 
applies the tap multiplier of a digital filter to a series of signal values etc. is because the demand which 
performs the same operation to the data value to which a single string relates occurs frequently. 
[0006] 

Moreover, it is desirable to perform a data-processing operation efficiently as quickly as possible. One 
policy which gathers a rate and effectiveness is avoiding again loading or the need of positioning again for 
the data value already stored in the register bank. In case this is realized, the instruction code which can 
carry out the reuse of the data value has the problem that it tends to become for a long time and complicated. 
If many instructions are needed directing a required operation, processing will become slow and the purpose 
of carrying out the reuse of the data value in a register bank will serve as an invalid. 
[0007] 

Instead of using general-purpose processors, such as clay 1 and a multi-tie tongue processor, giving the 
special function which supports a small number of digital-signal-processing operation to the digital digital 
disposal circuit of the special purpose is often performed. A data value required in big memory is stored in 
the digital digital disposal circuit of these special purposes, and a general approach takes out a data value 
required for each operation if needed. It is not necessary to reload a data value in big memory or, and it does 
not have to carry out repositioning. It is because it is controlled by operating the address used in order that 
those use sequence might access big memory. This approach has the problem acquired when a circuit must 
be specially designed so that it may have consistency in the operation performed, therefore a more typical 
general -purpose processor is used that the flexibility and ease of integration with other functions are 
missing. 
[0008] 
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The purpose of this invention is offering efficient and quick data processing, using a register bank and the 
instruction decoder which is supporting vector operation, and maintaining the flexibility of a general- 
purpose processor. 
[0009] 

According to one viewpoint, the data processor by this invention Register bank which has two or more 
registers It has an instruction decoder following at least one data-processing instruction which directs the 
vector operation which carries out multiple-times activation of the data-processing operation using the data 
value of a series of registers in said register bank. Said register bank contains at least one register subset. 
Said a series of registers exist in said subset. Said instruction decoder is characterized by what said a series 
of registers are controlled for so that said a series of registers turn within said register subset. 
[0010] 

Since it was made to turn within the register subset of the number of registers (fewer than all the numbers of 
registers) of a register bank, the compact code which carries out the reuse of the data value in a register bank 
can be written without reloading or moving a data value. By performing a required surroundings lump by 
hardware, a data value can be processed in sequence which could start instruction code from the point of 
having differed in the subset whenever it used it, therefore is different, without using the excessive 
instruction for dividing a series of vector registers. Furthermore, data transfer to the data value of two or 
more registers which do not exist in a subset can be performed to coincidence by performing vector 
operation to the subset of a register around which it turns to these selves. A surroundings lump of a register 
is obtained also by offering the hardware which supports the ring (circulation) buffer mold configuration 
which incorporates data to a buffer and was made to carry out multiplication from the buffer again in the 
point which pursues the buffer of each other round and round, and suits. 
[0011] 

Although the subset of a surroundings lump register can also be set only to one, to consider as the following 
systems is more advantageous. That is, said vector operation performs said data-processing operation using 
two or more data values to which two or more Muraji's register corresponds. Said two or more Muraji's 
register exists in each subset including the register subset of plurality [ register bank / said ]. Said instruction 
decoder controls said two or more Muraji's register so that said two or more Muraji's register turns within 
each register subset. 

In a digital-signal-processing operation, it is desirable for the need of carrying out the reuse of the data value 
from the register of 2 ream to use two or more surroundings lump register subsets by that (for example, the 
FIR operation or matrix operation which those offset is changed and does the multiplication and 
accumulation of a tap and a signal value) which is often generated. 
[0012] 

Although a subset may lap, since it is usually separated from the required data value of the reuse in such a 
situation in fact, a subset is good also as relatively prime. By this, the implementation of hardware becomes 
simpler and it is convenient. 
[0013] 

Two or more subsets can also consist of registers in the location mixed with the register which does not exist 
in these subsets. However, if two or more subsets are the groups of a ******** register with a number 
continuously, programming and an implementation will become easy. 
[0014] 

To continue is more desirable although the register group of each other is isolable in a register bank. It is 

because the register tooth space which can use this gentleman can be used more efficiently. 

[0015] 

The capacity of this invention which uses vector operation more effectively is complemented in the 
following desirable example. That is, it is equipment characterized by what it has further the transfer 
controller which controls a transfer of the data value between memory, and said memory and register in said 
register bank, and said transfer controller transmits a series of data values for following two or more move 
instructions between said memory and a series of registers in said register bank. 
[0016] 

The capacity of this invention to use vector operation efficiently according to the capacity to deliver a data 
value to the register block in a register bank since the block can be exchanged with one instruction after 
carrying out the reuse of the register block several times is acquired. 
[0017] 

Dividing using a series of registers in vector operation and a register bank into the register subset which was 
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able to be defined beforehand can be efficiently carried out in the desirable example to which each register 
group is characterized by what is accessed in the address through the incrementer which performs a 
surroundings lump among the register group's both ends. 
[0018] 

A series of registers used for vector operation can take many gestalten, such as a register in every other one 
in a subset. However, most generally a useful gestalt is a gestalt which are a series of registers with which a 
series of registers continued. 
[0019] 

The above-mentioned approach can be used for any processors which have a register bank and support 
vector operation. However, it also turns out that it does not interfere in other consideration adopted in the 
example to which, especially as for the capacity which uses a code as a compact and carries out the reuse of 
the data value in a register, it turns out to which that it is useful, and the register bank and the instruction 
decoder exist in a floating-point unit. 
[0020] 

According to another viewpoint, the data-processing approach by this invention The step which stores a data 
value in the register of the plurality of register bank, At least one data-processing instruction which directs 
vector operation is followed. It is the data-processing approach of having the step which carries out 
multiple-times activation of the data-processing operation using the data value of a series of registers in said 
register bank. As for said register bank, said a series of registers exist in said subset including at least one 
register subset. It is characterized by what said a series of registers turn around within said register subset 
during said activation. 
[0021] 

This approach is useful especially when performing efficiently the FIR filter operation which the relative 
offset between a tap multiplier value and a signal value is changed for every vector operation, and carries 
out the reuse of these values several times. 

The example of this invention is explained with reference to the following drawings as one example. 
[0022] 

Drawing 1 shows data processing system 22, and this contains a main processor 24, the floating-point unit 
co-processor 26, cache memory 28, main memory 30, and input/output system 32. A main processor 24, 
cache memory 28, main memory 30, and input/output system 32 are connected through Maine Bath 34. The 
co-processor bus 36 connects a main processor 24 to the floating-point unit co-processor 26. 
[0023] 

On the occasion of actuation, a main processor 24 (it is also called an ARM core) performs the data- 
processing instruction train which controls a general data-processing operation including a dialogue with 
cache memory 28, main memory 30, and input/output system 32. The co-processor instruction is embedded 
in the data-processing instruction train. A main processor 24 recognizes these co-processor instructions as 
what should be performed by the attached co-processor. Therefore, a main processor 24 is received from the 
co-processor bus 36 by the co-processor of attachment of a co-processor instruction by sending these co- 
processor instructions on the co-processor bus 36. The floating-point unit co-processor 26 receives and 
executes the co-processor instruction which detected being turned to itself. This detection is performed 
through the co-processor number field inside a co-processor instruction. 
[0024] 

Drawing 2 shows the floating-point unit co-processor 26 more to a detail typically, the floating-point unit 
co-processor 26 — 32 32-bit registers (shown in drawing 2 few) from — the becoming register bank 38 is 
included. These registers can operate as a register of the pair which operated according to the individual as a 
single precision register with which each stored 32 bit-data value, or stored 64 bit-data value by two. In the 
floating-point unit co-processor 26, the ****** U nit 40 and the load store control unit 42 of pipeline control 
are prepared. In a suitable situation, the ****** unit 40 and the load store control unit 42 of pipeline control 
operate to coincidence, the ****** unit 40 of pipeline control performs arithmetic operation (power 
accumulation and other operations are included) to the data value in a register bank 38, and the load store 
control unit 42 delivers the data value which the ****** unit 40 of pipeline control is not using to the 
floating-point unit co-processor 26 through a main processor 24. 
[0025] 

In the floating-point unit co-processor 26, the received co-processor instruction is latched in an instruction 
register 44. In this simplified drawing, it is possible that a co-processor instruction consists of the three 
register directions fields Rl, R2, and R3 (these fields may be divided and the whole instruction may be 
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made to distribute them variously in practice) following operation code and it. These register directions 
fields Rl, R2, and R3 support the register in the register bank 38 which functions as the destination, the 1st 
source, and the 2nd source of the data-processing operation currently performed, respectively. The vector 
control register 46 (some big registers are sufficient as this rather than it performs an additional function) 
stores the die-length value and step value for the vector operation performed in floating-point unit co- 
processor 26 grade. The vector control register 46 is initialized according to a vector control register load 
instruction, and can be updated with a die-length value and a step value. Since a vector die-length value and 
a step value are globally used within the floating-point unit co-processor 26, these values can be 
dynamically changed with the global base, without depending on a self-modification code. 
[0026] 

It is possible that register control and the instruction issue unit 48, the load store control unit 42, and the 
vector control unit 50 perform a part for the principal part of the fimction of an instruction decoder jointly. 
Register control and the instruction issue unit 48 output an initial register access (address) signal to a 
register bank 38 without decoding to operation code, or without using the vector control unit 50 according to 
operation code and the three register directions fields Rl, R2, and R3. Thus, since direct access can be 
carried out to an initial register value, early activation is attained. If a vector register is directed, the vector 
control unit 50 will make a series of register access signals using the triplet incrementer (adder) 52. The 
vector control unit 50 accesses a register bank 38 in the address corresponding to the die-length value and 
step value which were stored in the vector control register 46. The register scoreboard 54 is formed in order 
to perform a register lock. This is for the ****** unit 40 of pipeline control and the load store control unit 
42 which operates to coincidence not to cause a data consistency problem (the register scoreboards 54 can 
also be considered to be a part of register control and instruction issue unit 48). 
[0027] 

The operation code in an instruction register 44 specifies the classification (which [, such as addition, 
subtraction, multiplication, a division, loading, and a store, ] is instructions?) of the data-processing 
operation performed. This is not related to whether the register specified is a vector or Scala. Decoding of an 
instruction and the setup of the ****** unit 40 are made easy by this. The 1st register indicated value Rl 
and the 2nd register indicated value R2 are cooperation, and code the vector / Scala classification of the 
operation specified by operation code. Three common cases supported by coding are S=S*S (for example, 
fundamental random count created by the C compiler from the block of the C code), V=V op S (for 
example, carry out enlarging or contracting of the element of a vector), and V=V op V (for example, matrix 
operations, such as an FIR filter and graphic conversion) (in the above-mentioned sentence, "op" is a general 
operation and functor is a destination = second-operand op first operand). Moreover, depending on an 
instruction (for example, instruction in comparison with zero or an absolute value), it does not have a 
destination register (an output is a condition flag), or there is a thing with an insufficient (the instruction in 
comparison with zero has only one input operand) input operand. In such a case, since options, such as a 
vector / Scala classification, are specified, many operation code bit tooth spaces are usable, and can make all 
registers usable at each operand (for example, a compare instruction may completely [ whatever the 
register ] always be Scala). 
[0028] 

Although register control and the instruction issue unit 48, and the vector control unit 50 perform a part for 
the principal part of the function of an instruction decoder together, they determine and control the vector / 
Scala classification of the directed data-processing operation according to the 1st register indicated value Rl 
and the 2nd register indicated value R2. When the die-length value stored in the vector control register 46 
shows 1 (it corresponds to the value of the stored zero), this can be used as early directions of pure scalar 
operation. 
[0029] 

Drawing 3 is a flow chart and shows the flow of the processing used in order to decode a vector / Scala 
classification from register indicated value in single precision mode. In step 56, it investigates whether 
vector die length is globally set as 1 (it is equivalent to die-length value zero). When vector die length is 1, 
in step 58, all registers are treated as Scala. In step 60, the destination register Rl investigates whether it is 
within the limits of SO to S7. When that is right, all operations are Scala, and as shown in step 62, they serve 
as constructor form voice of S=S op S. When step 60 is NO, as shown in step 64, it is determined that the 
destination is a vector. When the destination is a vector, a code is treated noting that a second operand is 
also with a vector. Therefore, two possibility which remains in this phase is V=V op S and V=V op V. The 
check of step 66 which investigates whether a first operand is either of SO to S7 performs distinction of 
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these two options. If that is right, operations are V=V op S, otherwise, V=V op V. These conditions are 

recognized at steps 68 and 70, respectively. 

[0030] 

When vector die length is set as 1, all of 32 registers of a register bank 38 can be used as Scala. It is because 
the Scala classification of an operation is identified at step 58, without depending on the check of step 60 
which will restrict the number of the registers which can be used for the destination. When the vector and 
the Scala instruction use them, combining, the check of step 60 is useful although all Scala instructions are 
identified. Moreover, when calculating in the mixture mode of a vector and Scala, supposing a first operand 
is Scala, it is either of SO to S7. Supposing a first operand is a vector, it will be either of S8 to S3 1 . It is 
adaptation to a thing large generally to offer 3 times of the usable number of registers in a register bank to 
the first operand which is a vector, when the number of registers required in order to hold a series of data 
values uses vector operation. 
[0031] 

It will be understood that the general operation which a user wants to perform is graphic conversion. The 
conversion performed can be expressed with 4*4 matrix in a common case. That the reuse of the operand is 
carried out to such count shows that it is desirable to store a matrix value to the register treated as a vector. 
Similarly, an input pixel value is usually stored in four registers which can be treated as a vector so that a 
reuse can be carried out. The output of matrix operation is Scala (what accumulated separate vector line 
multiplication) usually stored in four registers. The vector register of 24 (16+4+4) individuals and the Scala 
register of eight (4+4) individual are needed to treat a twice as many input value as this and an output value. 
[0032] 

Although drawing 4 is a flow chart corresponding to drawing 3 , double precision mode is shown in this 
case. As stated above, in double precision mode, the register slot in a register bank 38 functions as a pair, 
and stores 64 bit-data value of 16 pieces in D15 from the logic register DO. In this case, the code of the 
vector / Scala classification of a register is drawing 3 , and is changed. That is, it has changed to whether 
"whether the destination to be either of DO to D3" and the check "whether a first operand is either of DO to 
D3", respectively. [ in / in the check of steps 60 and 66 / steps 72 and 74 ] 
[0033] 

Coding the vector / Scala classification of the register in the register appointed field as mentioned above 
produces the difficulty of case some of subtraction or a non-changing operation like a division, although 
order bit space is saved sharply. When a register configuration is V=V op S, about the problem which lacks 
symmetry between the first operand of a non-changing operation, and a second operand, it can conquer by 
extending an instruction set by taking in the pair of operation codes, such as SUB and RSUB showing two 
different operand options of non-commutative operation and DIV, and RDIV, without exchanging a register 
value with an additional instruction. 
[0034] 

Drawing 5 shows how to rotate a vector in the subset of a register bank 38. Especially in single precision 
mode, a register bank is divided into four register range, and those addresses are SO to S7 and S8 to SI 5 and 
SI 6 to S23 and S24 to S31. These register range is continuing without a common element mutually. In 
drawing 2 , the surroundings lump function of these subsets that have eight registers can be offered by using 
the triplet incrementer (adder) 52 into the vector control unit 50. If the subset range is crossed, an 
incrementer will turn back. This simple actuation becomes easy by adjusting the subset of two or more range 
which consists of 8 words in a register address tooth space. 
[0035] 

Drawing 5 is referred to again. In order to help an understanding of return actuation of a register, some 
vector operations are shown. As for the first vector operation, 4 (shown by 3 which is a die-length value in 
the vector control register 46), and a step value specify [ the initiation register / S2 and vector die length ] 1 
(shown by the zero which are a step value in the vector control register 46). Therefore, when an instruction 
is decoded where these global vector control parameter is set up, and referring to a register S2 as a vector, 
this instruction is executed 4 times, using respectively the data value in registers S2 and S3, S4, and S5. 
Since this vector does not cross the subset range, a surroundings lump of a vector is not performed. 
[0036] 

For an initiation register, in the 2nd example, S14 and vector die length are [ 6 and a step value ] 1. In this 
case, an instruction will begin from a register S14 and will be executed 6 times. Next, the register used is 
SI 5. A register's increment of only a step value rotates the register used to the register S8 instead of S16 
shortly. An instruction is executed further 3 times and completes all the processes SI 4, SI 5, and S8, S9, and 
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[0037] 

For an initiation register, in the example of the last of drawing 5 , S25 and vector die length are [ 8 and a 
step value ] 2. The register used first is S25 and the degree is S27, S29, and S31 according to a step value. 
After using a register S3 1 , the following register value turns back, passes a register S24 at the beginning of a 
subset according to return and the step value 2, and performs an operation using a register S25. In case an 
incrementer 52 moves between vector registers, it is good at the triplet adder which applies a step value to 
the current value. Therefore, a step can be adjusted by supplying a step value which is different in an adder. 
[0038] 

Drawing 6 shows a surroundings lump of a register bank 38 in double precision mode. In this mode, the 
subset of a register consists of DO to D3 and D4 to D7 and D8 to Dl 1 and D12 to D15. The minimum value 
inputted into the adder which functions as an incrementer 52 in double precision mode is 2. This is 
equivalent to 1 which is the step of double precision. When the step of double precision is 2, it is necessary 
to input 4 into an adder. For an initiation register, in the first example shown in drawing 6 , DO and vector 
die length are [ 4 and a step value ] 1. In this case, the sequence of a vector register is DO, Dl, D2, and D3. 
Since the range of a subset is not crossed, there is no surroundings lump in this example. For an initiation 
register, in the 2nd example, D15 and vector die length are [ 2 and a step value ] 2. In this case, the sequence 
of a vector register is D15 and D13. 
[0039] 

In drawing 2 , the load store control unit 42 has a 5-bit incrementer in the output, and the circumference 
lump of a register by which loading/store multiplex operation is applied to vector operation is not 
performed. Single loading/store multiplex instruction can access the register with which a required number 
continues by this. 
[0040] 

There is an FIR filter divided into the unit which consists of four signal values and four taps as an example 
of the operation using a surroundings [ this ] lump configuration. When syntax R8-R1 1 op R16-R19 express 
the vector operation of R8opR16, R9opR17, R10opR18, and Rl lopR19, an FIR filter operation can be 
performed as follows. 

Eight taps to R8-R15 load 8 signal value to R16-R23 R8-R1 lopR16-R19 - and put a result into R24-R27 
R9-R12opR16-R19 - and accumulate a result in R24-R27 R10-R13opR16-R19 » and accumulate a result in 
R24-R27 Rl l-R14opR16-R19 - and accumulate a result in R24-R27 Load a new tap to R8-R1 1 again. R12- 
R15opR16-R19 - and accumulate a result in R24-R27 R13-R8opR16-R19 - and accumulate a result in 
R24-R27 (R15-> it turns to R8) R14-R9opR16-R19 - and Accumulate a result in R24-R27 (R15-> it turns 
to R8). Accumulate R15-R10opR16-R19 and a result in R24-R27 (R15-> it turns to R8). Load a new tap to 
R12-R15 again. When a tap is lost load new data to R16-R19 again R12-R15opR20-R23 ~ and put a result 
into R28-R31 R13-R8opR20-R23 - and accumulate a result in R28-R31 (R15-> it turns to R8) R14- 
R9opR20-R23 - and accumulate a result in R28-R31 (R15-> it turns to R8) Accumulate R15-R10opR20- 
R23 and a result in R28-R31 (R15-> it turns to R8). the remainder - the above — the same . 
[0041] 

As mentioned above, loading can be performed [ therefore ] to juxtaposition to a register which is different 

from two or more accumulating totals (that is, double buffering can be attained). 

[0042] 

As for drawing 7 A, a main processor 24 shows typically how a co-processor instruction is seen. A main 
processor identifies an instruction as a co-processor instruction using the field 76 (this may divide) which is 
the combination of the bit in an instruction. A co-processor instruction includes the co-processor number 
field 78 in a standard ARM processor instruction set. A co-processor connected to the main processor 
investigates whether it is that by which the specific co-processor instruction was addressed to these co- 
processors using this co-processor number field 78. A co-processor number which is different in co- 
processors of a different type, such as a DSP co-processor (for example, II Piccolo co-processor made by 
ARM) or a floating-point unit co-processor, can be assigned, therefore it can access in the address separately 
within one system using the same co-processor bus 36. A co-processor instruction contains the operation 
code which a co-processor uses again, and three bit fields [ five ]. These 5 bit field specifies the destination, 
a first operand, and a second operand out of a co-processor register, respectively. A main processor enables 
it to perform a data-processing operation with both desirable co-processors and main processors by decoding 
a co-processor instruction partially at least in some instructions, such as a co-processor load and a store. A 
main processor can also decode the data type coded in the co-processor number as part of instruction 
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decoding performed in such a situation again. 
[0043] 

Drawing 7 B shows the case where the co-processor which supports double precision and single-precision 
operation decodes the received co-processor instruction. Two continuous co-processor numbers are assigned 
to such a co-processor, and it checks whether itself is the co-processor of the destination using the top triplet 
of a co-processor number. Thus, since the least significant bit of a co-processor number turns into an 
excessive bit when checking the co-processor of the destination, it can use in order to specify the data type 
used in this in case the co-processor instruction is executed. In this example, a data type corresponds to the 
data size of single precision or double precision. 
[0044] 

In double precision mode, it turns out that the number of registers decreases from 32 to 16 substantially. 
Therefore, although register field size can be made small, decoding of which register to use in that case 
cannot be directly obtained from the self-inclusion field in the known location in a co-processor instruction, 
but is dependent on decoding of other parts of a co-processor instruction. This not only complicates 
actuation of a co-processor, but has disadvantageous profit of making it late. By coding a data type using the 
least significant bit of a co-processor number, operation code stops being completely dependent on a data 
type, and will simplify the decoding, and it will speed up. 
[0045] 

Drawing 7 C shows the case where the co-processor which supports only the single data type which is the 
subset of the data type which the co-processor of drawing 7 B supports decodes a co-processor instruction. 
In this case, it is decided whether that instruction should be received using a perfect co-processor number. 
Thus, if it is the data type by which the co-processor instruction is not supported, since it corresponds to 
another co-processor number, it will not be received. And a main processor 24 performs instruction 
exception handling of the undefined, and emulates an operation to the data type which is not supported. 
[0046] 

Drawing 8 shows the data processing system which has the ARM core 80. The ARM core 80 functioned as a 
main processor, and is connected with the co-processor 84 which supports both single precision and a 
double precision data type through the co-processor bus 82. If the co-processor instruction containing a co- 
processor number is found out within an instruction train, it will be passed on the co-processor bus 82 from 
the ARM core 80. And if a co-processor 84 is in agreement in a co-processor number as compared with the 
number of itself, it will send a reception signal to the ARM core 80. Supposing it does not receive a 
reception signal, it will recognize that an ARM core is an undefined-instruction exception, and the 
exception-handling code stored in the memory system 86 will be referred to. 
[0047] 

Drawing 9 shows the system which exchanged the co-processor 84 in the system of drawing 8 to the co- 
processor 88 which supports only single precision operation. In this case, a co-processor 88 recognizes only 
one co-processor number. Therefore, although the double precision co-processor instruction in the original 
instruction train is executed depending on the co-processor 84 of drawing 8 , it is not received depending on 
the single precision co-processor 88. Therefore, the undefined exception-handling code in a memory system 
86 can include a double precision emulation routine to perform the same code. 
[0048] 

Although the need of emulating a double precision instruction makes activation of these instructions late, the 
single precision co-processor 88 is advantageous, when it can do cheaply small rather than the double 
precision co-processor 84 and a double precision instruction does not appear rarely enough. 
[0049] 

Drawing 10 shows the instruction latch circuit in the co-processor 84 which supports both single precision 
and a double precision instruction, and has two adjoining co-processor numbers. In this case, top triplet CP 
[ of the co-processor number in a co-processor instruction ] # [3:1] is compared with the number assigned to 
the co-processor 84. In this example, when the co-processor 84 has the co-processor numbers 10 and 11, this 
comparison can be performed by testing top triplet CP[ of a co-processor number ] # [3:1] by comparison to 
a binary number 101. Supposing it is in agreement, a reception signal will be returned to the ARM core 80, 
and a co-processor instruction will be latched and executed. 
[0050] 

Drawing 1 1 shows the equal circuit in the single precision co-processor 88 of drawin g 9 . In this case, only 
one co-processor number is recognized and single precision operation is used as a default. The comparison 
which performs whether a co-processor instruction should be received and latched when deciding is 
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performed between all 4-bit CP# [3:0] of a co-processor number, and the binary digit 1010 which is one 

embedded co-processor number. 

[0051] 

Drawing 1 2 shows the flow in the case of starting the undefined exception-handling routine of the example 
of drawing 9 , and running a double precision emulation code. Therefore, the following procedures are 
completed. It investigates whether the instruction made to generate an undefined-instruction exception is a 
co-processor instruction which has the binary number 1011 which is a co-processor number (step 90). If that 
is right, this instruction is meant as a double precision instruction, and can be emulated at step 92. Then, it 
returns to the flow of a main program. When other exception types are not detected at step 90, they can also 
detect and process at a previous step. 
[0052] 

Drawing 13 shows the example which used the format register FPREG200 for storing the information which 
identifies the type of the data stored in the 32-bit each register or data slot of a register bank 220. Each data 
slot can make it operate separately as a single precision register for storing 32 bit-data value (data word), as 
stated above. Or it can also be made to operate as a double precision register for making it other data slots 
and a pair and storing 64 bit-data value (2 data word). According to the desirable example of this invention, 
the format register FPREG200 is constituted so that it may identify whether the data slot stores single 
precision data or double precision data in it. 
[0053] 

As shown in drawing 13 , 32 data slots of a register bank 220 are constituted so that 16 pairs of data slots 
may be offered. When the first data slot stores the single precision data value in it, since the data slot of the 
another side of a pair of stores only a single precision data value and a double precision data value is stored 
in a desirable example, it is not linked with other data slots, this — which data slot pair — two single 
precision data values — or either of one double precision data value will certainly be stored. This 
information is discriminable using the 1-bit information in connection with each data slot pair in a register 
bank 220. Therefore, in a desirable example, the format register FPREG200 is constituted so that the 16-bit 
information that the type of the data stored in each data slot pair of a register bank 220 is identified may be 
stored. Therefore, the format register FPREG200 can be constituted as a 16-bit register or a 32-bit register 
which has 16-bit information in order to maintain adjustment with other registers in the floating-point unit 
co-processor 26. 
[0054] 

Drawing 15 shows six pairs of data slots in a register bank 220. Since the double precision data value of six 
pieces or the single precision data value of 12 pieces is stored according to the desirable example, these data 
slot pairs can be used. The example of data storable in these data slot is shown in drawing 1 5 . DH expresses 
the 32 most significant bits of a double precision data value, DL expresses the 32 least significant bits of a 
double precision data value, and S expresses a single precision data value. 
[0055] 

The entry to which it corresponds in the format register FPREG200 by the desirable example of this 
invention is also shown in drawing 1 5 . According to the desirable example, the single precision data value 
is stored in at least one data slot of a data slot pair to which a value "0" corresponds by the value "1" stored 
in the format register FPREG200 showing that the double precision data value is stored in a corresponding 
data slot pair, or it means that both data slots are not initialized. That is, when both data slots are not 
initialized, one data slot is not initialized among data slot pairs, the data slot of another side stores the single 
precision data value or the data slot of the both of a pair of stores the single precision data value, a logic "0" 
value is stored in the bit to which the format register FPREG200 corresponds. 
[0056] 

the co-processor instruction which could use the FPU co-processor 26 of a desirable example, and could 
process either the single precision data value or the double precision data value, and the main processor 24 
published as stated above — it ~ a single precision instruction — or a double precision instruction (see 
drawing 7 B and the related description) is identified. When an instruction is received by the co-processor, 
register control and the instruction issue unit 48 are passed, and it decodes and performs. If an instruction is 
a load instruction, register control and the instruction issue unit 48 will direct to take out the data specified 
as the load store control unit 42 from memory, and to store it in the data slot as which the register bank 220 
was specified. The data from which the co-processor is taken out in this phase will know a single precision 
data value or a double precision data value, and the load store control unit 42 operates according to it. 
Therefore, the load store control unit 42 passes either a 32-bit single precision data value or a 64-bit double 
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precision data value to the register bank input logic 230 through a path 225, and is made to store it in a 

register bank 220. 

[0057] 

Data enable it to identify whether each data slot pair which was supplied also to the format register 
FPREG200, added the required bit to it, and it is not only loaded to a register bank 220, but received data 
with the load store control unit 42 stores single precision data or double precision data. In a desirable 
example, before this data is loaded to a register bank, it is stored in the format register FPREG200. This is 
because the register bank input logic 230 can use this information. 
[0058] 

In a desirable example, the internal format of the data in a register bank 220 is the same as an external 
format. Therefore, a single precision data value is stored as a 32 bit-data value, and a double precision data 
value is stored in a register bank 220 as a 64 bit-data value. Since the register bank input logic 230 can 
access the format register FPREG200, the data which it has received now know single precision or double 
precision. Therefore, in such an example, the register bank input logic 230 only arranges data so that the 
data received through the path 225 may be stored in the suitable data slot of a register bank 220. However, 
in another example, when the internal format in a register bank differs from an external format, the register 
bank input logic 230 can also be constituted so that required conversion may be performed. For example, a 
numeric value is expressed as what applied the value to which only the characteristic usually carried out the 
exponentiation of the base value to l.abc— . Since 1 on the left-hand side of decimal point is expressed, 
single precision typical for effectiveness and a double precision expression do not use a data bit. Rather, the 
1 is contained tacitly. When the internal representation currently used in the register bank 220 needs to 
express 1 with a certain reason clearly, the register bank input logic 230 performs required data conversion. 
In such an example, in order to hold the additional data made by the register bank input logic 230, a data slot 
is usually partly larger than 32 bits. 
[0059] 

The load store control unit 42 not only loads a data value to a register bank 220, but can load data to one or 
more system registers FPSCR210 of a co-processor 26, for example, the user status and a control register. In 
a desirable example, the user status and a control register FPSCR210 contain the configuration bit and 
exception status bit which a user can access. This detail is left to the explanation of the architecture of a 
floating-point unit shown in the end of explanation of an example. 
[0060] 

When the data slot in the register bank 220 which has the contents from which register control and the 
instruction issue unit 48 should store store instruction in reception, and the instruction should store it in 
memory is specified, the actuation according to it is directed to the load store control unit 42, and required 
data word is read from a register bank 220 to the load store control unit 42 through the register bank output 
logic 240. The register bank output logic 240 accesses the contents of the FPREG register 200, in order that 
the data currently read may judge single precision data or double precision data. And by performing suitable 
data conversion, data conversion which the register bank input logic 230 performed is returned, and the data 
is supplied to the load store control logic 42 through a path 235. 
[0061] 

According to the desirable example of this invention, when store instruction is a double precision 
instruction, it is possible that the co-processor 26 is operating by the 2nd mode of operation which applies 
an instruction to a double precision data value. Since the double precision data value holds even data word, 
the store instruction published by the 2nd mode of operation specifies even data slots which usually have the 
contents which should be stored in memory. However, according to the desirable example of this invention, 
if odd data slots are specified, the load store control unit 42 reads the contents of the FPREG register 200, 
and after it stores these contents in memory first, it stores even data slots as which it was specified in the 
register bank 220. In order to specify the data slot which should usually be transmitted, the number of the 
data slots which begin from the data slot which specified the specific data slot in a register bank by the base 
address first, and was specified as the degree and which should be stored (namely, the number of data word) 
is specified numerically. 
[0062] 

Therefore, since the number of the specified data slots is odd although the contents of the data slot of a total 
of 32 pieces are stored in memory when store instruction gives the data slot of the beginning in a register 
bank 220 as a base address and specifies 33 data slots, for example, the contents of the FPREG register 200 
are also stored in memory. 
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[0063] 

By this approach, both the contents of the register bank and the contents of the FPREG register 200 which 
identifies the data type stored in the data slot of a register bank 220 are storable in memory using one 
instruction. Since the contents of the FPREG register 200 are clearly stored by this, the problem that another 
instruction must be published can be avoided, therefore it does not have a bad influence on memory into the 
instruction execution of loading from a store or memory at a process speed. 
[0064] 

In the further example of this invention, it is also storable in memory by advancing one more step of this 
technique using one instruction if needed, additional system register 210, for example, FPSCR register. The 
example of the register bank 220 which has 32 data slots examined above is considered. When 33 data slots 
are specified by store instruction, the FPREG register 200 is stored in memory with the contents of 32 data 
slots in a register bank 220. However, when different odd number exceeding the number of data slots in a 
register bank, for example, 35**, is specified, as for the load store control unit 42, this can be interpreted as 
the demand which should also store the contents of the FPREG register 200, and the not only the data slot of 
a register bank 220 but contents of the FPSCR register 210 in memory. A co-processor can also prepare the 
further system register, for example, the exception register which specifies the exception generated by the 
co-processor while processing the instruction. When different odd number in store instruction, for example, 
37**, is specified, the load store control unit 42 can be interpreted as the demand in which the contents of 
one or more exception registers should also store this with the contents of the FPSCR register 210, the 
FPREG register 200, and the register bank 220. 
[0065] 

This technique has the case where the code which directs a store or a load instruction does not know the 
contents of the register bank, and the useful contents of a register bank, especially when [ mere ] it is 
temporarily stored in memory and is taken out by the register bank later. When the code knows the contents 
of the register bank, it may be necessary to store the contents of the FPREG register 200 in memory. The 
examples of representation of the code which does not know the contents of the register bank are a context 
change code, a procedure call entry, and exit routine. 
[0066] 

In such a case, as the contents of the FPREG register 200 can be efficiently stored in memory with the 
contents of the register bank and being stated in the top, to be sure, an alien-system register is also storable if 
needed. 
[0067] 

The same process will be performed if a consecutive load instruction is received. Therefore, if the double 
precision load instruction which specifies odd data slots is received, the load store control unit 42 loads the 
contents of the FPREG register 200 to the FPREG register 200, loads the contents of the system register 
shown with the number of slots specified by the load instruction, and stores even more data word in the data 
slot as which the register bank 220 was specified. Therefore, when it thinks in the example described above 
and the number of data slots specified by the load instruction is 33, the contents of a FPREG register are 
loaded to the FPREG register 200, and the contents of 32 data slots are loaded following it. When the 
number of data slots similarly specified by the load instruction is 35, not only the above-mentioned contents 
but the contents of the FPSCR register 210 are loaded to a FPSCR register. At the end, when the specified 
number of data slots is 37, not only the above-mentioned contents but the contents of the exception register 
are loaded to these exceptions register. Probably the specific actuation in connection with specific odd 
number will completely be arbitrary, and it will be clear for this contractor for it to be able to change if 
needed. 
[0068] 

Drawing 14 is the flow chart showing actuation of the register control and the instruction issue logic 48 by 
the desirable example of this invention when executing store instruction and a load instruction. First, the 
number of data word (this is the same as the number of data slots in a desirable example) is read from an 
instruction with the first register number specified with an instruction in step 300, i.e., a base register. And 
in step 310, it investigates whether this instruction is a double precision instruction. As stated above, this 
information is available to a co-processor in this phase. An instruction is because it specifies the double 
precision instruction or the single precision instruction. 
[0069] 

When an instruction is a double precision instruction, processing progresses to step 320 and the numbers of 
words specified with an instruction there judge whether it is odd number. In this example, it assumes that the 
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technique for transmitting a system register alternatively with the FPREG register 200 does not use it, when 
numbers of words are odd, this shows what the contents of the FPREG register 200 should be transmitted 
for, therefore in step 325, the contents of the FPREG register are transmitted with the load store control unit 
42. And the increment of the numbers of words is carried out only for 1 at step 327, and processing 
progresses to step 330. If it is judged that numbers of words are even in step 320, processing will progress to 
the direct step 330. 
[0070] 

In step 330, it is judged whether numbers of words are larger than zero. When that is not right, it is 
considered that the instruction was completed and it slips out of processing at step 340. However, processing 
is received and passed to the register number with which progressed to step 332 and the double precision 
data value (namely, the contents of two data slots) was first instructed to be when numbers of words are 
larger than zero. Then, in step 334, the decrement of the numbers of words is carried out only for 2, and the 
increment of the register number is carried out only for 1 in step 336. Since a register actually consists of 
two data slots in a double precision instruction as stated above, it is as the same as only 2 increments the 
number of data slots that only 1 increments a register count. 
[0071] 

And it judges whether numbers of words of processing are still larger than zero to step 330 return and there. 
Processing is repeated when still large. If numbers of words become zero, at step 340, it will escape from 
processing and will come out. 
[0072] 

In step 310, when an instruction is judged not to be a double precision instruction, processing progresses to 
step 350 and it judges whether numbers of words are larger than zero again. When large, it progresses to 
step 352 and a single precision data value is delivered to the first register number specified with an 
instruction. And in step 354, only 1 carries out the decrement of the numbers of words, only 1 increments a 
register count in step 356, and the following data slot is directed. And it judges whether return and numbers 
of words of processing are still larger than zero to step 350. When large, processing is repeated until 
numbers of words become zero. If it becomes zero, at step 360, it will escape from processing and will come 
out. 
[0073] 

In case the above-mentioned approach performs the code which does not know the contents of the register 
bank, it gives big versatility for for example, a context change code, a procedure call entry, exit routine, etc. 
Since an operating system does not know the contents of the register in these cases, it is desirable that it is 
necessary to be made not to carry out treatment which changed the register with those contents. According 
to the above-mentioned approach, these code routines can be written in by one store or load instruction 
which specifies odd data word. When a co-processor needs to use the information on the contents of a 
register and odd data word is specified as the instruction, a co-processor is interpreted as it being the 
demand in which format information required since the contents of the data in a register bank are specified 
also loads this to memory from storing or memory. In order to support the co-processor which needs the 
information on the contents of a register according to this flexibility, the need of using the unique operating 
system software is lost. 
[0074] 

This approach cancels the need of loading or storing the information on the contents of a register by another 
operation within a code again. Since it is incorporated while the alternative which loads the information on 
the contents of a register, or is stored orders, additional memory access is unnecessary. Code die length 
becomes short by this, and time amount can be saved potentially. 
[0075] 

The explanation about the architecture of the floating-point unit incorporating the above-mentioned 

technique is shown below. 

[0076] 

1. Introduction VFPvl is the architecture of the floating point system (FPS) designed so that it might realize 
as a co-processor used for an ARM processor module. If this architecture is carried out, the characteristic 
function of hardware or software is incorporate. Or the supplement of a function and IEEE754 
compatibility can be offered using software. It has the intention of this specification so that IEEE754 perfect 
compatibility may be attained using hardware and a software support. 
[0077] 

Two co-processor numbers are used for VFPvl. 10 is used for the operation of a single precision operand, 
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and 1 1 is used for the operation of a double precision operand. The conversion between single precision data 
and double precision data is attained by two conversion command which operates in a source operand co- 
processor tooth space. 
[0078] 

The description of VFPvl architecture is as follows. 

- - which has IEEE754 and perfect compatibility using a support code in hardware Single precision register 
whose number is 32. Each is addressable - as a source operand or a destination register. 16 double precision 
registers. Each is addressable as a source operand or a destination register. (A double precision register laps 
with a physical single precision register) 

- A vector mode raises sharply the concurrency nature of floating point code density, loading, and store 
actuation. 

- Four banks which consist of eight circulation single precision registers, Or - to which four banks which 
consist of four double precision registers raise DSP and a graphic operation A non-normal processing option 
either IEEE754 compatibility or high-speed zero-clear capacity (to premise [ support / from a floating point 
emulation package ]) - to choose Implementation [ of ****** which the pipe run was completely carried out 
and was connected ] - which achieves results with IEEE754 compatibility A FFTOSIZ instruction is used 
and it is high-speed conversion from the floating point to an integer C, C++, and for Java. [0079] 

An implementer can also realize VFPvl by hardware completely, and can also use combining hardware and 

a support code. VFPvl is also completely realizable with software. 

[0080] 

2. Terminology This specification is using the following vocabulary. 

Automatic exception (automatic exception): It is in an exceptional condition. This is not concerned with the 
value of each exception enabling bit, but flies to a support code from this exceptional condition. Selection of 
which exception to make automatic is an option in a creation time point. Refer to the section 0. 
[0081] 

Exception handling [0082] 

Bounce (bounce): It is the exception reported to the operating system, and it is completely processed in 
support code, without [ without it calls a user trap handler, or ] interrupting the normal flow of a user code. 
[0083] 

CDP: In Coprocessor Data ProcessingTPS (floating point system), CDP processing is loading or not a store 

operation but arithmetic operation. 

[0084] 

ConvertToUnsignedlnteger(Fm): Change the contents in Fm into a sign-less 32-bit integral value. It depends 
for a result on the rounding-off mode for the last rounding-off of the floating point value of a 32-bit 
unsigned integer out of range, and handling. A floating point input value is a negative value, or an 
INVALID exception may arise, when too large as a 32-bit unsigned integer. 
[0085] 

ConverToSignedlnteger(Fm): Change the contents of Fm into a 32-bit integral value with a sign. It depends 
for a result on the rounding-off mode for the last rounding-off of the floating point value of a 32-bit signed 
integer out of range, and handling. An INVALID exception may arise, when a floating point input value is 
too large as a 32-bit signed integer. 
[0086] 

ConvertUnsignedIntToSingle/Double(Rd): Change into single precision or a double-precision-floating-point 
value the contents of the ARM register (Rd) decoded as a 32-bit unsigned-integer value. When destination 
precision is single precision, an INEXACT exception may occur by the conversion operation. 
[0087] 

ConvertSignedIntToSingle/Double(Rd): Change into single precision or a double-precision- floating-point 
value the contents of the ARM register (Rd) decoded as a 32-bit unsigned-integer value. When destination 
precision is single precision, an INEXACT exception may occur by the conversion operation. 
[0088] 

The value of the range of Deriormalized value( value by which denormalization was carried out): (-2 
Emin<x<2Emin) is expressed. In IEEE754 format of single precision and a double precision operand, the 
characteristic of a denormalization value is zero and a head bit is not 1 but 0. It is specified that 754 to 
IEEE1985 specification must perform creation and actuation of a denormalization operand in the same 
precision as a normalization operand. 
[0089] 
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Disabled exception(forbidden exception): When the exception enabling bit to which it relates in FPCSR is 
set as 0, an exception says, 'It is forbidden'. IEEE754 specification defines the right result which should be 
returned to these exceptions. The operation which generates exception condition produces the result which 
carried out the bounce to the support code and was defined by IEEE754. An exception is not reported to a 
user exception handler. 
[0090] 

The exception with which the exception enabling bit Enabled exception (permitted exception) : related was 
set as 1 . When this exception occurs, a trap is hung on a user handler. The operation which generates 
exception condition produces the result which carried out the bounce to the support code and was defined by 
IEEE754. An exception is reported to a user exception handler. 
[0091] 

Exponent (characteristic): In order to decide the value of the expressed figure, it is the component showing 
the number of integer BEKI when carrying out the exponentiation of 2 of a floating point number. A 
characteristic is occasionally called a sign and a sign-less characteristic. 
[0092] 

The field of significand (significant figure) in the right-hand side of the binary point Fraction(ed) : (decimal) 
suggested [0093] 

Flush-To-Zero Mode: In this mode, not all the values that are in the range of (-2 Emin<x<2Emin) after 

rounding off are changed into a denormalization value, but are treated as zero. 

[0094] 

High (Fn/Fm): 32 bits [63:32] of high orders of the double precision value expressed within memory [0095] 
IEEE754-1985:"IEEE Standard for Binary Floating-Point Arithmetic" and ANSI/IEEE Std 754-1985, The 
Institute of Electrical and Electronics Engineers, Inc.New York, New York, and 10017., although often 
called IEEE754 criterion This criterion is dealt with with the data type in a floating point system, a right 
operation, and an exception type, and defines an error limit. Almost all processors are made so that it may be 
based on this criterion in the combination of hardware and software in hardware. 
[0096] 

Infinity infinity is expressed with a special format of Infinity:IEEE754. Since all (significant figure) of 

precision and significand are zero, a characteristic serves as max. 

[0097] 

Input exception: Exception condition by which one or more operands for the given operation are not 
supported by hardware. In order to complete an operation, the bounce of the operation is carried out to a 
support code. 
[0098] 

Intermediate result (intermediate result): The internal format used since a count result is stored before 
rounding off. This format may have a bigger exponent field than a destination format and the significand 
(significant figure) field. 
[0099] 

Low (Fn/Fm): 32 bits [31:0] of low order of the double precision value expressed within memory [0100] 
MCR: It is migration" to the co-processor from "ARM register. In FPS, the instruction which transmits data 
or a control register between an ARM register and a FPS register is included in this. Only 32-bit information 
can be transmitted using one MCR class instruction. 
[0101] 

MRC: It is migration" to the ARM register from "co-processor. In FPS, the instruction which transmits data 
or a control register between FPS and an ARM register is included in this. Only 32-bit information can be 
transmitted using one MRC class instruction. 
[0102] 

NaN: Not a figure but the notation-stereo coded by the floating point format. : to which two types of NaN 
are — a signal and a non-signal, or quiescence. Signal NaN produces an Invalid Operand (invalid operand) 
exception, when using it as an operand. Quiescence NaN is spread to almost all arithmetic operation, 
without telling an exception. A format of NaN is significand (significant figure) all whose exponent fields 
are not the zero of 1 . In order to express Signal NaN, as for Quiescence NaN, the most significant bit is set 
as 1 for the most significant bit of a decimal by zero. 
[0103] 

Reserved(reserved): — if the field of a control register or an instruction format becomes "reserved" and the 
contents of the field are not zero, when the implementation is to define the field — UNPREDICTABLE 
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^**********^ m a resu i^ j s produced. These fields are reserved in order to use it at the time of a future 
architecture escape, or they are used only for a specific implementation. All the reserved bits that are not 
used by the implementation must be written in with zero, and are read as zero. 
[0104] 

Rounding Mode(rounding-off mode): IEEE754 specification is demanded as performing as if all count had 
the precision of infinity. That is, the multiplication of two single precision values must calculate significand 
(figure of merit) with the number of bits twice the precision of significand (figure of merit). In order to 
express this value with the precision of the destination, the need of rounding off significand (significant 
figure) often comes out. IEEE754 criterion is in :(RM) mode in which four rounding-off modes are 
specified. [ which is rounded off to (RN) mode to round off and zero at rounding-off or (RZ) mode to omit, 
(RP) mode rounded off at a positive infinity, and a negative infinity ] The first mode is rounded off by the 
midpoint, if the least significant bit of significand (figure of merit) becomes zero at the time of a tie case, it 
will be revalued, and it makes a figure "even number." The 2nd mode always omits substantially the bit on 
the right-hand side of significand (figure of merit). This is used in C, C++, and Java language in integer 
conversion. The two next modes are used in interval arithmetic. 
[0105] 

Significand (significant figure): It is the component of a binary floating point number, and consists of a head 
bit in the left-hand side of the suggested binary point specified or suggested, and the decimal field in right- 
hand side. 
[0106] 

Support Code(support code): It is the software which must be used in case hardware is filled up, in order to 
give compatibility with IEEE754 criterion. A support code has two components. One is the assembly of a 
routine and it performs functions currently supported, such as a division using the input which the operation 
beyond the range of hardware, such as transcendency count, may be performed [ input ], and may generate 
an exception and which is not supported. Another is the set of an exception handler, and it processes 
exception condition so that it may be based on IEEE754. A support code must perform the created function 
and must emulate the suitable handling of the data type which is not supported or data representation (for 
example, a non-legal value or a decimal data type). If attention is paid so that a user's condition may be 
recovered at the outlet of a routine, a routine can also be written to use FPS in middle count. 
[0107] 

Trap (trap): It is the exception condition which sets each exception enabling bit to FPSCR. A user's trap 

handler is performed. 

[0108] 

UNDEFINED (undefined): It is the thing of an instruction which generates an undefined-instruction trap. 
Refer to ARM Architectural Reference Manual about the detailed information about an ARM exception. 
[0109] 

It is the result of an instruction or control register field value which cannot be UNPREDICTABLE(ed) : 
(**********) trusted. It seems that an instruction or a result do not become a fault on secrecy, or any parts 
of a processor or a system are not stopped. [********** ] 
[OHO] 

Unsupported Data (data which are not supported): The specific data value which a bounce is earned out to a 
support code and processed without being processed by hardware. There are infinity, NaN, a non-legal 
value, and zero in these data. It can choose freely which is supported completely partially by hardware 
among these values, or whether in order to complete data processing, the assistance of a support code is 
needed by the implementation. If the corresponding exception enabling bit is set, the trap of the exception 
produced from having processed the data which are not supported will be carried out to a user code. 
[0111] 

3. Register file 3.1 Introduction Architecture can be equipped with 32 single precision registers and 16 
double precision registers, and all registers can be addressed according to an individual within the 5-bit 
register index defined completely as the source or a destination operand. 
[0112] 

32 single precision registers have lapped with 16 double precision registers, namely, if double precision data 
are written in D5, they will overwrite the contents of S 1 0 and S 1 1 . A compiler or an assembly language 
programmer must know that using it as one half of using a register as the single precision region of data 
storage and the double precision region of data storage will compete, when using a register by the 
overlapping implementations. Since the hardware for restricting use of a register to one precision is not 
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prepared, a result becomes ********** if this is not followed. 
[0113] 

VFPvl enables it to access these registers by the Scala mode or the vector mode. The result produced using 
one, two, or three operand registers in the Scala mode is written in a destination register. Moreover, refer to 
the group of a register for the specified operand in a vector mode. In the case of a single precision operand, 
in the case of a double precision operand, VFPvl is supporting vector operation to a maximum of four 
elements as opposed to a maximum of eight elements in one instruction. 
* 1 L E N tf> h 3 — K-ffc 
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[0114] 

A vector mode is permitted by writing values other than zero in the LEN field. If 0 is contained in the LEN 
field, FPS will operate in the Scala mode and the register field will be decoded with addressing 32 single 
precision registers or 16 double precision registers in a flat register model. When the LEN field is not zero, 
FPS operates by the vector mode and the register field operates as an address vector of a register. Refer to 
Table 1 about the code of the LEN field. 
[0115] 

The approach of making Scala and vector operation intermingled can be performed by specifying a 
destination register, without changing the LEN field. Scalar operation can be specified in a vector mode, if a 
destination register is in the 1st register bank (S0-S7 or D0-D3). For details, refer to the section 0. 
[0116] 

3.2 Operation of single precision register When the LEN field in FPSCR is 0, 32 single precision register 
S0-S3 1 can be used. Either of these registers can be used as the source or a destination register. 
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A single precision (co-processor 10) register map can be created as shown in example of drawing 1 . 
[0117] 

When the LEN field in FPSCR is larger than 0, as shown in drawing 2 , each carries out behavior of the 
register file as four banks which consist of eight circulating registers, the 1st of a vector register - bank V0- 
V7 overlap Scala register S0-S7, and they are accessed in the address as Scala or a vector with the register 
chosen to each operand. Refer to the usage of a section 0 and 3.4 registers for details. 
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[0118] 

For example, if LEN in FPSCR is set to 3, and a vector VI 0 is referred to, registers S10, SI 1, SI 2, and S13 
will be concerned with vector operation. Similarly, if V22 is referred to, S22, S23, SI 6, and S17 will be 
concerned with an operation. When a register file is accessed by the vector mode, the register following V7 
is V0. In V16, V24 follows [ V8 ] V31 following V23 following VI 5 back similarly. 
[0119] 

3.3 Usage of double precision register When the LEN field in FPSCR is 0, 16 double precision Scala 
registers can be used. 
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Any register can be used as the source or a destination register. A register map can be created as shown in 

example of drawing 3. 

[0120] 

When the LEN field in FPSCR is larger than 0, four Scala registers and 16 vector registers can use it with 
four bank gestalten which each becomes from four circulating registers, as shown in drawing 4 . Bank V0- 
V3 of the beginning of a vector register overlap Scala register D0-D3. The address of the register is carried 
out according to the register which the address was carried out as Scala or was chosen to each operand. 
Refer to a section 0 and the 3.4 register usage for details. 
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It circulates through a double precision register in four banks like the example of the single precision of a 

section 0. 

[0121] 

3.4 Register usage Three operations are supported between Scala and a vector. (Any of three operand 
operations are sufficient as;OP3 as which any of two operand operations by which OP2 is supported by the 
floating-point coprocessor are sufficient.) [0122] 

In the case of single precision operation, in the case of the double precision operation as register S0-S7, in 

the following explanation, the f first bank 1 of a register file is defined as register D0-D3. 

- ScalarD = OP2 ScalarA or ScalarD = ScalarA OP3 ScalarB or ScalarD = ScalarA * ScalarB + ScalarD- 

VectorD = OP2 ScalarA or VectorD = ScalarA OP3 VectorB or VectorD = ScalarA * VectorB+VectorD- 

VectorD = OP2 VectorA or VectorD = VectorA OP3 VectorB or VectorD = VectorA* VectorB+VectorD 

[0123] 

3.4.1 Scalar operation FPS operates in the Scala mode according to two conditions. 
[0124] 

1? The LEN field in FPSCR is 0. In the case of single precision operation, any of the Scala register 0-31 are 
sufficient as the destination and a source register, and, in the case of double precision operation, any of 0-15 
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sufficient as them. An operation is performed only to the register specified clearly with an instruction. 
[0125] 

2? A destination register is in a bank of the beginning of a register file. Any of other registers are sufficient 
as source Scala. This mode makes it possible to make scalar operation and vector operation intermingled, 
without changing the LEN field in FPSCR. 
[0126] 

3.4.2 Operation with the vector transfer point using Scala and vector transfer origin The operation in this 
mode has the LEN field larger than zero in FPSCR, and when there is no destination register in a bank of the 
beginning of a register file, it is performed. Which register of a bank of the beginning of a register file is 
sufficient as a source Scala register, and each remaining register can be used for VectorB. ********** [ a 
source Scala register is the member of VectorB, or / the behavior ] when VectorD overlaps VectorB by die 
length shorter than a LEN element, that is, the vector with same VectorD and VectorB or in all members, 
it must be distinguished completely. Refer to the summary table of a section 0. 

[0127] 

3.4.3 Operation only using vector data The operation in this mode has the LEN field larger than zero in 
FPSCR, and when there is no destination vector register in a bank of the beginning of a register file, it is 
performed. Each element of a VectorA vector is combined with the corresponding element in VectorB, and 
is written in VectorD. Each register which is not into a bank of the beginning of a register file can use 
VectorA, and all vectors can be used for VectorB. ********** [ the behavior ] as well as the 2nd case when 
a source vector or a destination vector overlaps by die length shorter than a LEN element. They must be 
distinguished [ in / same or / all members ] completely. Refer to the summary table of a section 0. 

In the case of the operation of a FMAC family, a destination register or a vector is always a accumulation 

register or a vector. 

[0128] 

3.4.4 Operation summary table The following tables show the option about the register use to **, double 
precision 2, and 3 operand instructions. It is shown that all registers can be used in the precision over the 
operand specified as 'Any (any) 1 . 
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[0129] 
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4. Instruction set A FPS instruction can be divided into three categories. 

- Transfer [ between MCR, MRC:ARM, and FPS ] - Loading and store - between LDC, STC:FPS, and 
memory CDP: Data processing [0130] 

4. 1 Instruction synchronia Two level is shown in the intention of a FPS architecture specification. A pipeline 
functional unit and concurrency load / store actuation which has a CDP function. In case it performs in 
parallel with the actuation under current processing, when these actuation supports loading and store 
actuation without a register dependency, the big improvement in the engine performance is obtained. 
[0131] 

4.2 Serial-izing of instruction By FPS, ARM has specified the single instruction which keeps FPS waiting 
until all instructions under current activation are completed, and until each exception condition is known. If 
the exception is undecided, a serial-ized instruction will be stopped and exception handling will be started in 
ARM. The serial-ized instruction in FPS is :-. FMOVX: Read or write in to a floating-point-system register. 
[0132] 

The read in or the writing to a floating-point-system register is interrupted until a current instruction is 
completed. FMOVX to System ID Register (FPSID) is started by the exception generated by the floating 
point instruction to precede. The read in / modification / writing to User Status and Control Register 
(FPSCR) (using FMOVX) can be performed, and an exception-condition bit (FPSCR [4:0]) can be cleared. 
[0133] 

4.3 Conversion using integer data The conversion between the floating point and an integer data is 2 step 
process in FPS. That is, it consists of data transfer instruction treating an integer data, and a CDP instruction 
which performs conversion. When arithmetic operation is tried to the integer data of a FPS register with an 
integer format, and such an operation must be avoided. [ the result ] 

[0134] 

4.3.1 Conversion to floating point data from integer data of FPS register An integer data is MCR. It can load 
to a floating point single precision register from an ARM register using a FMOVS instruction. And the 
integer data in a FPS register is changed into single precision or a double-precision-floating-point value, and 
is written in a destination FPS register by a series of integer / floating point conversion operations. A 
destination register may be a source register when an integral value is unnecessary. An integer can be made 
into a sign and the amount of sign-less 32 bits. 

[0135] 

4.3.2 Conversion to integer data from floating point data of FPS register The value of FPS single precision 
or a double precision register is convertible for sign[ a sign and ]-less 32 binary-integer format with a series 
of floating point / integer conversion operations. The acquired integer is put in by the destination single 
precision register. An integer data is MRC. It is storable in an ARM register using a FMOVS instruction. 
[0136] 

4.4 Access by the address of register file The instruction which operates in a single precision tooth space 
(S= 0) uses 5 bits of an instruction field for operand access. 4 bits of high orders are contained in the 
operand field by which the label was carried out to Fn, Fm, or Fd. The least significant bit of the address is 
contained in N, M, or D. 

[0137] 

The instruction which operates in a double precision tooth space (S= 1) uses only 4 bits of high orders of the 
operand address. These 4 bits are contained in Fn, Fm, and Fd field. N, M, and D bit must be 0 when the 
operand address is contained in the corresponding operand field. 
[0138] 

4.5 MCR (it Moves to Co-processor from ARM Register) 

MCR actuation is transmitting or using the data in an ARM register by FPS. This includes the actuation 
which is made to move data to a FPS register from the ARM register of a pair in a double precision format, 
loads a sign and a unsigned-integer value to a single precision FPS register from an ARM register, and loads 
the contents of the ARM register to a control register further from an ARM register in a single precision 
format. 

A format of an MCR instruction is shown in example of drawing 5. 



http://www4.ipdl.ncipi.go.jp/cgi-bin/tran_web_cgi_ejje 10/5/2006 



JP,2002-517038,A [DETAILED DESCRIPTION] Page 21 of 38 



31 2S 27 24 23 2t 20 19 16 15 12 II 8 7 6 5 4 3 0 



COND 11 1 0 


-OpooU.- 0 Fn Rd 1 0 1 S - N R R 1 ^j^f^ 




t* y b 7 4 — 
>\> K 


mm 


Opcode 




Rd 




S 


1 : Wt^yK 


N 




Fn 


0000-FPID (37°PtytIDf|-) 

0 0 0 1-FPS C R(^ — if *;**5 .fcitt&J 

0 100-FPREG (l'^#77'fA'Wl / 


R 





http://www4ipdl.ncipi.gojp/cgi-bin/tran_web_cgi_ejje 1 0/5/2006 



JP,2002-517038,A [DETAILED DESCRIPTION] 



Page 22 of 3 8 



MC R&£=i— K7-f- ^ K5£i 



Opcode 

7 W —A' K 






0 0 0 


FMOVS 


F=Rd ( 3 2 If y V , 


0 0 0 


F MO V L D 


Low(Fn)=Rd(f&*S^T'f4 3 2 f y h . =J 
^O-fe 5/1^ 1 1 ) 


0 0 1 


FMOVHD 


High(Fn)=Rd({g*$g±#: 3 2 tr y K 
a^P-feyfl 1) 


0 10-1 

1 0 






111 


FMOVX 


System Reg=Rd (^^p-fe^-^i 0^^< 



[0139] 

Notes: Only 32 bit-data processing is supported with a FMOV [S, HD, LD] instruction. Only the data of an 
ARM register or a single precision register are moved by FMOVS actuation. In case a double precision 
operand is transmitted from two ARM registers, FMOVLD and a FMOVHD instruction move in a lower 
half and an upper half, respectively. 
[0140] 

4.6 MRC (it Moves to ARM Register from Co-processor / Comparison Floating-point Register) 

MRC actuation transmits the data of a FPS register to an ARM register. This moves migration or a double 

precision FPS register for the result of having changed the single precision value or the floating point value 

into the integer to an ARM register to two ARM registers, and includes the actuation which changes the 

status bit of CPSR by the result of a pre- floating-point-compare operation. 

A format of a MRC instruction is shown in example of drawing 6. 
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[0141] 

4.7 LCD/STC (loading/store FPS register) LDC and STC actuation transmit data between FPS and memory. 
A floating point data updates an ARM address register, or is in a condition as it is, and can transmit it in any 
precision by single data transfer or two or more data transfer. The structure of both a full descending-order 
stack and an empty ascending-order stack is supported. Furthermore, two or more operands access to DS is 
also supported with two or more migration instruction. Refer to Table 1 1 about explanation of the various 
options of LDC and STC. 

A format of LDC and an STC instruction is shown in example of drawing 7. 



51 



28 27 



25 24 23 22 21 20 19 



16 15 



12 M 



8 7 



COND 



1 1 0 



w 



Rn 



Fd 



1 0 1 S 



J0IJ7 LDC/STC^^7>J— ^yY 



http://www4.ipdl.ncipi.go.jp/cgi-bin/tran_web_cgi_ejje 



10/5/2006 



JP,2002-517038,A [DETAILED DESCRIPTION] 



Page 25 of 38 



—>v K 


•mm 


p 




u 


±./T\Zy KO=T, 1 =±) 


D 




W 


70^yn s yF ( 0 = 9 -f h y 9 t£ b > 1 = 7 >f 
V /< y V ) 


L 


*-(Rjtr y h (0=* h7, 1 = n — K) 


fin 


ARM*"< — ^ l/^^^3*-K 


Fd 




S 


0 : ^S^7>K 

1 : M«^7^K 




FLDM (IA/DB) *5 J; tfFSTM (TA/DB) UTfelt *<# ^ 
■ig- & b 8 \fy Y*yZy h £ ft i^v?7^S ( 



[0142] 

4.7.1 General-precautions point about loading and store actuation Loading and a store of two or more 
registers are linearly performed through a register file without carrying out a surroundings lump in the range 
which consists of 4 or eight registers which are used by vector operation. ********** [the result ] when 
loading is tried exceeding the termination of a register file. 
[0143] 

Although 32 still more nearly another bit-data items can be written in or 32 another bit-data items can be 
read in an implementation when 17 or the number of registers of the odd number not more than it is 
contained in offset of a double load or two or more stores, it is not necessary to necessarily make it such. 
Those contents can be specified, in case this additional data item is used and a register is loaded or stored. 
This is helpful when a register file format has the type information which needs each register for the 
discernment in memory of itself unlike IEEE754 format of the precision. When offset is larger than the 
number of single precision registers with odd number, it can use for this starting the context change of a 
register, and all system registers. 



http : //www4 . ipdl . ncipi . go . j p/cgi-bin/ tran_web_cgi_ej j e 



10/5/2006 



JP,2002-517038,A [DETAILED DESCRIPTION] 



Page 26 of 38 



y 

b/ 



0 



0 



FLDM<condXS/D>Rn. < V v> * y * 

b> 

FSTM<condXS/D>Rn. < U is X ? ]} * 
b> 



-8l0 o Hry^y b~7 ' — A* K1CW:3 2 tf 2/ h^©^AotV^§ 
&V*:<D2z&&$:v~- h*i~ % Z b Z a _____ 



FLDMEQSrT2, {f8-fll} ; 



r 1 2©7h^^i^4 



f pl/^^s8 > s9. s 10^o-Kt5. r 1 2!^^ 
FSTMEQDr4, {fO} ; dOA^l i@£>f&*ftl££: r 4 <OT K^^^^ h T 
f5o r4M„ 



<b 9 _~ b '< y ? &> V o 



K/^ hTt6 0 Kn(Di$^ b^l/y* y ? * 



0 



9 <dW. 



FLDM<cond>IA<S/D>Rn!. <Ui?X ? V 
* b> 

FSTM<cond>IA<S/D>Rn!. < V *S X ? V 
X b> 



— A- Kid [2 3 2 tf y h^iH^^^A-oTl^o Rn^O?^ ^ 
&x — * & n — KA* h7t^fcfel^fflfS 0 



0»J : 

FLDMEQIASrl3 ! , {fl2-fl5} ; r 1 3 <DT Kt/*# 9 4 <H «r 4 
fiCf pl/i^^^s 1 2, si 3, s 1 4 , s 1 5^n-Kt$. 
r 1 ZteWf—fiklfelrT Kj^ClftS, 



ytyy-xh*), 9 J b y 9 ft L 0 



http://www4.ipdl.ncipi.go.jp/cgi-bin/tran_web_cgi_ejje 



10/5/2006 



JP,2002-517038,A [DETAILED DESCRIPTION] 



Page 27 of 38 



1 


0 


y 


FLD<condXS/D> [Rn. #+/ -of f so t] , F 


^- y ± y b 






b 


d 


h V p — K/ 








FST<condXS/D> [Rn. #+/-of f set] , F 
d 





£>9, Rn^^Px.5 (U=l) ^^H/O^^LlK (U=0) iiiaDT 





» 








FSTEQDf4, [r8, #+8] ; 32 (8*4) ^ht7ty 












* £ 7 h X y ;7 &> 9 „ 




1 


1 




FLDM<cond>DB<S/D>Rn!.<W-^^.^ y 




















FSTM<cond>DB<S/D>Rn l^U&X&y 










* b> 















y #r LV>^ — <7*y hT K UX&Rn\Z9 ^ b '< y * o V 
h7^wv Y\z.XZ 3 2 tr y h^M^Aotv^, h^s> * 
111^^*4^!), Rnrf»ib_IL3l< (U=0) 0 K»i 
^^O^JlK^.^ y bT-tZfj^ &<Dg-mx* yffab*— K 
t5C}:ffflt5o 



FSTMEQDBSr9!, {f27-f29} ; s2 7, s28, s 2 9 ^ h 3 fa^llMfc 

r 9 CAotV^o r 9 «0f LV^^^oi^ h V £$8^3: 0 ^3E*f 





[0144] 

4.7.2 LDC/STC actuation summary P [ in / in Table 12 / LDC/STC operation code ], W, and the 
combination of U bits permitted ~ each -- the function of offset to an effective operation is shown. 
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[0145] 

4.8 CDP (co-processor data processing) A CDP instruction produces a result using the operand from a 
floating point register file, and includes all data-processing actuation of the result in which it is returned to a 
register file. It is a FMAC (multiplication-accumulation was connected) operation that there is especially 
interest, and this is an operation which multiplies two operands and adds the 3rd operand. Before an IEEE 
rounding-off operation adds the 3rd operand, this operation is the point performed to a product, and differs 
from a fusion ****** operation. Thereby, the Java code can speed up [of a****** operation ] by using a 
FMAC operation rather than the operation added after multiplication. 
[0146] 

Two instructions in a CDP group are helpful when changing the floating point value in a FPS register into 
the integral value. FFTOUI [S/D] changes the contents of single precision or the double precision register 
into the unsigned integer in a FPS register using the present rounding-off mode in FPSCR. FFTOSI [S/D] 
performs conversion to a signed integer. Although FFTOUIZ [S/D] and FFTOSIZ [S/D] perform the same 
function, they make an invalid FPSCR rounding-off mode for conversion, and omit a decimal bit. In case the 
function of FFTOSIZ [S/D] is changed into an integer from the floating point, it is required by C, C++, and 
Java. Since a FFTOSIZ [S/D] instruction offers this capacity, without adjusting the rounding-off mode bit 
for conversion from FPSCR to RZ, it decreases even to the cycle count of a FFTOSIZ [S/D] operation, and 
it carries out 4-6 cycle saving of the cycle count for conversion. 
[0147] 

A comparison operation is CDP. MRC following a CMP instruction and it FMOVX It is ARM about the 
FPS flag bit (FPSCR [3 1 :28]) which were obtained by carrying out using the FPSCR instruction. It loads to 
a CPSR flag bit. A compare instruction does not have the possibility of an INVALID (invalid) exception, 
when one of the comparison operands is NaN. Although FCMP and FCMPO do not tell INVALID (invalid) 
when one of the comparison operands is NaN, FCMPE and FCMPEO tell an exception. FCMPO and 
FCMPEO compare the operand in Fm field with 0, and they set a FPS flag according to the result. The ARM 
flags N, Z, C, and V are FMOVX. It defines as follows after a FPSCR instruction. 

N: There are also few twists. Z: It is equal. C: With [ or it is large / be / it / equal or ] no sequence V: With 
no sequence A format of a CDP instruction is shown in example of drawing 8. 
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[0148] 

4.8.1 Operation code Table 14 shows the elementary operation code of a CDP instruction. All mnemonic 
codes take the form of [OPERATION], [COND], and [S/D]. 
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[0149] 

4.8.2 Extended arithmetic Table 15 shows the extended instruction using Extend (extended value) of the 
operation code field. Although all instructions take the form of [OPERATION], [COND], and [S/D], 
serialization and a FLSCB instruction are exceptions. The instruction code of extended arithmetic is made 
like the index to the register file of Fn operand, i.e., {Fn [3:0], N}. 



http ://www4. ipdl . ncipi . go . j p/cgi -bin/tran_web_cgi_ej j e 



10/5/2006 



JP,2002-517038,A [DETAILED DESCRIPTION] 



m i 5 cd p&m&G 



Fn | N 






0 0 0 0 0 


FCPY 


Fd=Fm 


0 0 0 0 1 


FABS 


Fd=abs(Fm) 


0 0 0 1 0 


FNEG 


Fd=- (Fm) 


0 0 0 1 1 


FSQRT 


Fd=sqrt(Fm) 


0 0 1 0 0- 
0 0 111 






0 10 0 0 


FCMP* 


Flags :Fd<»Fm 


0 10 0 1 


FCMPE* 


Flags : =Fd OFm^j ft #t & ft*) 


0 10 10 


FCMPO* 


Flags :=FdO0 


0 10 11 


FCMPEO* 


Flags :=FdOO01J^«^W «3 


0 110 0- 
0 1110 






0 1111 


FCVTD<cond>S* 


&£*Xfctitf> 0 (a/ot ytl 0) 


0 1111 


FCVTS<cond>D* 


Pd(l£*£S U'v?^ ^ n— K)=Fm(f&*ft£ 
Wv^*3- K)te*f£a»<b 
&£;h,jt t><7? 0 (a^otyf 1 1) 


1 0 0 0 0 


FUITO* 


Fd=^&L&i&*i|M&*it£K:^& (F 
m) 


1 0 0 0 1 


FSITO* 


Fd^-i-feipSIS^W^^A^^m (F 
m) 


10 0 10- 
10 111 






110 0 0 


FFTOUI* 


Fd=flF##Lg!8:K^& (Fm) 
{^tTRMODE} 1 


110 0 1 


FFTOSIZ* 


Fd=^-§-^L^^^m (Fm) 
{RZ*-— K) 


110 10 


FFTOSI* 


Fd=^#fcl9^^iC^m (Fm) 
{^tTRMODE} 


110 11 


FFTOSIZ* 


Fd=^ifci!)&&K£lfe (Fm) 
{RZ*— KJ 


1 110 0- 

11111 







hyWfc^WUB&ift^ LEN7-f-/U KWtflS«*Jx, ^*7^iS 



[0150] 

5. System Register 5.1 System ID Register (FPSID) 

FPS architecture and an implementation definition ID value are included in FPSID. 
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It can opt for the model of FPS and a mask set number, the description set, and modification using this 
WORD. FPSID is only for read in and the writing to FPSID is disregarded. Refer to example of drawing 9 
about a FPSID register layout: --- - ...... _ ..... 

[0151] 

5.2 User Status and Control Register (FPSCR) 

A FPSCR register contains a configuration bit with an accessible user, and an exception status bit. An 
exception authorization bit, rounding control, a vector step and die length, a non-normal operand, handling 
of a result, and use of a debug mode are included in a configuration option. An operating system uses this 
register with a user, FPS is constituted or it is used for asking the condition of the completed instruction. 
This must save and must be recovered at the time of a context change. Bits 3 1 -28 have a flag value from the 
newest compare instruction, and can be accessed using the read in of FPSCR. FPSCR is shown in example 
of drawing 10. 
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[0152] 

5.2.1 Status comparison and processing control byte Bits 31-28 have some useful control bits, although the 
arithmetic response of FPS is specified in the result of the newest compare instruction, and a special 
situation. A format of a status comparison and a processing control byte is shown in example of drawing 11. 
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5.2.2 System control cutting tool A system control cutting tool controls rounding-off mode, a vector step, 
and the vector die-length field. A bit is specified as shown in example of drawing 12. 
[0154] - 

VFPvl architecture has incorporated the register file step device used for vector operation. When the step bit 
is set to 00, the register chosen as a degree in vector operation turns into a register just behind a front 
register within a register file. The circumference lump device of a normal register file is not influenced 
depending on a step value. When a step value is 1 1, only 2 increments all input registers and output 
registers. 

For example, on :FMULEQS F8 which performs the following non-vector operation, F16, F24FMULEQS 
F10, F18, F26FMULEQS F12, F20, F28FMULEQS F14, F22, and an F30 real target, FMULEQS F8, F16, 
and F24 are not one registers, and are straddling two registers of operands for multiplication in a register file 
at a time. 
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[0155] 

5.2.3 Exception authorization cutting tool An exception authorization cutting tool occupies a bit 15:8, and 
has enabling for exception traps. A bit is specified as shown in example of drawing 13. The exception 
enabling bit has agreed in the demand of the IEEE754 use for processing of floating point exception 
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condition. If the bit is set, an exception will be permitted, and FPS tells a user visible trap to an operating 
system, when exception condition occurs about a current instruction. If the bit is cleared, an exception will 
not be permitted, and FPS does not tell a user visible trap to an.operating system, even if an exception 
occurs. However, a mathematically rational result is generated. The default of an exception enabling bit is 
forbidden. Refer to IEEE754 criterion about the detail about exception handling. 
[0156] 

Even if the exception is a disable depending on the implementation, in order to deal with the exception 
condition besides the capacity of hardware, the bounce to a support code may be generated. This looks 
general to a user code. 
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[0157] 

5.2.4 Exception status byte An exception status byte occupies the bit 7:0 of FPSCR, and has an exception 
status flag bit. There are five exception status flag bits and one flag bit supports each floating point 
exception at a time. 'It has pasted 1 up, and if these bits are set by the once detected exception, they must be 
cleared with the FMOVX instruction or FSERIALCL instruction written in FPSCR. These bits are specified 
as shown in example of drawing 14. When an exception is permitted, a corresponding exception status bit is 
not set automatically. The role of a support code sets a suitable exception status bit if needed. A certain 
exception may be performed automatically, namely, if exception condition is detected, FPS will not be 
concerned with how the exception enabling bit was set, but will carry out a bounce by the consecutive 
floating point instruction. Complicated exception handling can be performed not by hardware but by 
software rather than it is required by IEEE754 criterion by this. As an example, there are underflow 
conditions by which FZ bit was set to 0. In this case, a right result may be a denormalized number 
depending on the characteristic and rounding-off mode of a result. By FPS, an implementer can choose the 
response including a bounce option, can make a right result using a support code, and can write this value in 
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a destination register. If the underflow exception enabling bit is set, a user trap handler will be called after a 
support code completes an instruction. This code can change the condition of FPS, and can return, or can 
end processing. - - 



7 5 4 3 


2 


1 


0 


j-mmz- | IXC|UFC 


OFC 


DZC 


IOC 


mmi4 FPSCRW^^T-- 













7 : 5 






4 


IXC 




3 


UFC 




2 


OFC 




1 


DZC 




0 


IOC 





[0158] 

5.3 Contents Register of Register File (FPREG) 

Since the contents of a register [ decoding by the program which is running now ] are expressed 
appropriately, the contents register of a register file is a privileged register which has the information which 
a debugger can use. FPREG has 16 bits and is assigned 1 bit at a time to each double precision register in a 
register file. A set of a bit displays the physical register pair expressed with the bit as a double precision 
register. If a bit is cleared, a physical register is not initialized or has one or two single precision data values. 
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[0159] 

6. Exception handling FPS operates in one of the modes among two mode, debug mode, and normal-mode 
**s. If DM bit is set to FPSCR, FPS will operate by the debug mode. In this mode, FPS executes one 
instruction at once, and ARM is carried out again in the meantime until the exception condition of an 
instruction is known. Although a register file and memory become exact about the flow of an instruction by 
this, the sacrifice that the execution time increases sharply will be paid. If a resource allows FPS, when a 
new instruction is received from ARM and exception condition is detected, it tells an exception. Exception 
reporting to ARM is always exact about a floating-point-instruction train. In the case of loading merely 
performed to vector operation and juxtaposition following vector operation, or store instruction, it is an 
exception. The contents of the register file may become in this case, less exact [ about store instruction / the 
contents of memory ] about a load instruction again. 
[0160] 

6.1 Support code The implementation of FPS can be based on IEEE754 using hardware and a software 
support. About the data type and automatic exception which are not supported, a support code performs the 
function of the based hardware, returns a result to a destination register, and returns to a user ! s code. A user's 
trap handler is not called in that case, or flow of a user's code is not changed. For a user, it seems that only 
hardware processed the floating point code. Although carrying out a bounce to a support code reduces 
sharply the time amount which performs these processings in order to deal with these processings, in a user 
code, the embedded application, and the arithmetic application written well, it is usually rare for these 
situations to occur. 
[0161] 

It has the intention of a support code so that it may have two components. One of them is the assembly of a 
routine and it performs functions currently supported, such as a division using the input which the operation 
beyond the range of hardware, such as transcendency count, may be performed [ input ], and may generate 
an exception and which is not supported. Another is the set of an exception handler, and it processes an 
exception trap so that it may be based on IEEE754. A support code must perform the created function and 
must emulate the suitable handling of the data type which is not supported or data representation (for 
example, non-legal value). If attention is paid so that a user's condition may be recovered at the outlet of a 
routine, a routine can also be written to use FPS in middle count. 



http://www4.ipdl. ncipi.go.jp/cgi-bin/tran_web_cgiejje 



10/5/2006 



JP,2002-517038,A [DETAILED DESCRIPTION] 



Page 37 of 38 



[0162] 

6.2 Exception reporting and processing The following floating point instruction published after exception 
condition is detected reports the exception in a normal mode to ARM. The condition of an ARM processor, 
a FPS register file, and memory may not be exact about a violation instruction when an exception is taken. 
The instruction is emulated correctly and a support code is obtained in sufficient information to process the 
exception produced from the instruction. 
[0163] 

Using a support code, it is some which have special IEEE754 data which contain infinity, NaN, non-normal 
data, and zero depending on an implementation, or all instructions can also be processed. The 
implementation which carries out such processing is referred to as data which are not having these data 
supported, it carries out a bounce to a support code so that it may not look generally to a user code, puts an 
IEEE754 assignment result into a destination register, and returns. As for the exception produced from this 
actuation, anythings follow the IEEE754 exception Ruhr. If the corresponding exception enabling bit is set, 
the trap to a user code can also include this exception. 
[0164] 

The correspondence to exception condition is defined to the case where it is when [ both ] not enabling 
IEEE754 criterion with the case where it enables the exception bit in FPSCR. VFPvl architecture has not 
specified the boundary between the hardware and software which are used so that it may be based suitable 
for IEEE754 specification. 
[0165] 

6.2.1 Instruction and format which are not supported FPS is not supporting the conversion from /to the 
instruction of decimal data, or decimal data. These instructions are demanded by IEEE754 criterion and 
must be offered in support code. When using decimal data, the assembly of the routine of a desired function 
is needed. Since FPS does not have a decimal data type, it cannot use in order to carry out the trap of the 
instruction which uses decimal data. 

[0166] 

6.2.2 Use of FMOVX when FPS serves as disable or exception The FMOVX instruction executed in a 
supervisor or undefined mode reads R/W, FPSID, or FPREG for FPSCR, without telling ARM about an 
exception (when the implementation supporting the disable option), when FPS is an exception condition or a 
disable. 

[0167] 

Although the specific example of this invention was described, probably, it will be clear that this invention 
is not restricted to them and that various modification and additions at within the limits of invention can be 
performed. For example, the description of the following dependent claims can be variously combined with 
the description of an independent claim, without deviating from the range of this invention. 
[Brief Description of the Drawings] 
[Drawing 1] 

It is the schematic diagram of data processing system. 
[Drawing 2 ] 

It is the explanatory view of the floating-point unit which supports both the Scala register and a vector 

register. 

[Drawing 3] 

In single precision operation, it is the flow chart showing how it is decided any of a vector register or the 
Scala register the given registers are. 
[Drawing 4] 

In double precision operation, it is the flow chart showing how it is decided any of a vector register or the 
Scala register the given registers are. 
[Drawing 5] 

It is drawing showing the surroundings lump at the time of single precision operation within each subset 
which divided the register bank. 
[Drawing 6] 

It is drawing showing the surroundings lump at the time of double precision operation within each subset 
which divided the register bank. 
[Drawing 7 A] 

It is drawing showing the co-processor instruction which the co-processor instruction which a main 
processor looks at, single precision, and a double precision co-processor look at, and the co-processor 
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instruction which a single precision co-processor looks at, respectively. 
[Drawing 7 B] 

It is drawing showing the co-processor instruction .which the co-processor instruction which a main 
processor looks at, single precision, and a double precision co-processor look at, and the co-processor 
instruction which a single precision co-processor looks at, respectively. 
[Drawing 7 C] 

It is drawing showing the co-processor instruction which the co-processor instruction which a main 
processor looks at, single precision, and a double precision co-processor look at, and the co-processor 
instruction which a single precision co-processor looks at, respectively. 
[Drawing 8] 

It is drawing showing the main processor which controls single precision and a double precision co- 
processor. 
[Drawing 9] 

It is drawing showing the main processor which controls a single precision co-processor. 
[Drawing 10] 

In order to show that the co-processor instruction was received, it is drawing showing the circuit in the 
single precision and the double precision co-processor which determine whether a reception signal should be 
returned to a main processor. 
[Drawing 11 ] 

In order to show that the co-processor instruction was received, it is drawing which determines whether a 
reception signal should be returned to a main processor and in which showing the circuit in a single 
precision co-processor. 
[Drawing 12] 

It is drawing showing the handling of the instruction exception of the undefined within a main processor. 
[Drawing 13] 

It is the block diagram showing the element of the co-processor by the desirable example of this invention. 
[Drawing 14] 

It is the flow chart showing actuation of the register control by the desirable example of this invention, and 
instruction issue logic. 
[Drawing 15] 

It is an example of the contents of the floating-point register by the desirable example of this invention. 
[Drawing 1 6] 

It is drawing showing the register bank in clay 1 processor. 
[Drawing 17 ] 

It is drawing showing the register bank in a multi-tie tongue processor. 
[Translation done.] 
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[Drawing 5] 
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[Drawing 6] 
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[Drawing 7 A] 
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[Drawing 7 B] 
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[Drawing 7 C] 
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[Drawing 8] 
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[Drawing 9] 
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[Drawing 12] 
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[Drawing 15 ] 
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[Drawing 1 7] 
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WRITTEN AMENDMENT 



[Procedure revision] The decodement presentation document of the 34th article amendment of Patent 
Cooperation Treaty 

[Filing Date] March 9, Heisei 12 (2000. 3.9) 

[Procedure amendment 1] 

[Document to be Amended] Specification 

[Item(s) to be Amended] Claim 1 

[Method of Amendment] Modification 

[Proposed Amendment] 

[Claim 1 ] It is a data processor, 

The register bank which has two or more accessible registers in the address, 

It has an instruction decoder following at least one data-processing instruction which directs the vector 
operation which carries out multiple-times activation of the data-processing operation starting with the 
initial register directed to said data-processing instruction using the data value of a series of registers in said 
register bank, 

As for said register bank, said a series of registers exist in said subset including at least one register subset, 

Said instruction decoder is a data processor characterized by what said a series of registers are controlled for 

so that said a series of registers turn within said register subset. 

[Procedure amendment 2] 

[Document to be Amended] Specification 

[Item(s) to be Amended] Claim 12 

[Method of Amendment] Modification 

[Proposed Amendment] 

[Claim 12] The step which stores a data value in two or more accessible registers in the address of a register 
bank, 

It is the data-processing approach of having the step which starts with the initial register directed to said 
data-processing instruction using the data value of a series of registers in said register bank following at 
least one data-processing instruction which directs vector operation, and carries out multiple-times 
activation of the data-processing operation, 

As for said register bank, said a series of registers exist in said subset including at least one register subset, 

It is the data-processing approach characterized by what said a series of registers turn around within said 

register subset during said activation. 

[Procedure amendment 3] 

[Document to be Amended] Specification 

[Item(s) to be Amended] 0009 

[Method of Amendment] Modification 

[Proposed Amendment] 

[0009] 

The data processor [ according to one viewpoint ] by this invention, 

The register bank which has two or more accessible registers in the address, 

It has an instruction decoder following at least one data-processing instruction which directs the vector 
operation which carries out multiple-times activation of the data-processing operation starting with the 
initial register directed to said data-processing instruction using the data value of a series of registers in said 
register bank, 

As for said register bank, said a series of registers exist in said subset including at least one register subset, 
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Said instruction decoder is characterized by what said a series of registers are controlled for so that said a 

series of registers turn within said register subset. 

[Procedure amendment^] 

[Document to be Amended] Specification 

[Item(s) to be Amended] 0010 

[Method of Amendment] Modification 

[Proposed Amendment] 

[0010] 

Since it was made to turn within the register subset of the number of registers (fewer than all the numbers of 
registers) of a register bank, the compact code which carries out the reuse of the data value in a register bank 
can be written without reloading or moving a data value. By performing a required surroundings lump by 
hardware, a data value can be processed in sequence which could start instruction code from the point of 
having differed in the subset whenever it used it, therefore is different, without using the excessive 
instruction for dividing a series of vector registers. Furthermore, data transfer to the data value of two or 
more registers which do not exist in a subset can be performed to coincidence by performing vector 
operation to the subset of a register around which it turns to these selves. A surroundings lump of a register 
is obtained also by offering the hardware which supports the ring (circulation) buffer mold configuration 
which incorporates data to a buffer and was read from the buffer again in two or more points which pursue 
the buffer of each other round and round, and suit. 
[Procedure amendment 5] 
[Document to be Amended] Specification 
[Item(s) to be Amended] 0020 
[Method of Amendment] Modification 
[Proposed Amendment] 
[0020] 

The data-processing approach [ according to another viewpoint ] by this invention, 
The step which stores a data value in two or more accessible registers in the address of a register bank, 
It is the data-processing approach of having the step which starts with the initial register directed to said 
data-processing instruction using the data value of a series of registers in said register bank following at 
least one data-processing instruction which directs vector operation, and carries out multiple-times 
activation of the data-processing operation, 

As for said register bank, said a series of registers exist in said subset including at least one register subset, 
It is characterized by what said a series of registers turn around within said register subset during said 
activation. 

[Translation done.] 
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